Week 1

Week 1: Linear Models I
Introduction to Analytics in Finance
Katerina Chrysikou
KBS
Week 1
Katerina Chrysikou (KBS) Week 1 1 / 56

Outline
Introduction
Typical Problems
Data and Notation
Model formulation
The classical Linear Regression Model
Univariate regression Models
Ordinary Least Squares
Deriving the Estimator
Assumptions
Properties of Estimators
Statistical Inference
Testing Hypotheses
Empirical analysis
References

Finance questions that may be solved by
Econometricians I
Testing whether financial markets are weak-form informationally

efficient.
Testing whether the CAPM or APT represent superior models for
the determination of returns on risky assets.
Measuring and forecasting the volatility of bond returns.
Explaining the determinants of bond credit ratings used by the
ratings agencies.
Modelling long-term relationships between prices and exchange
rates.

Finance questions that may be solved by
Econometricians II
Determining the optimal hedge ratio for a spot position in oil.

Testing technical trading rules to determine which makes the
most money.
Testing the hypothesis that earnings or dividend announcements
have no effect on stock prices.
Testing whether spot or futures markets react more rapidly to
news.
Forecasting the correlation between the returns to the stock
indices of two countries.

Special Characteristics and Types of Data
Frequency and quantity of data: Stock market prices are measured

every time there is a trade or somebody posts a new quote.
Quality: Recorded asset prices are usually those at which the
transaction took place. No possibility for measurement error but
financial data are “noisy”.
There are 3 types of data which econometricians might use for
analysis:
1 Time series data;
2 Cross-sectional data;
3 Panel data, a combination of 1. and 2.

Time Series Data I
Examples of time series data:
Series Frequency
GDP, Industrial Production/unemployment monthly, or quarterly
government budget deficit annually
money supply weekly
value of a stock market index end of day
value of a stock price as transactions occur

Time Series Data II
Examples of Problems that Could be Tackled Using a Time Series

Regression
How the value of a country’s stock index has varied with that
country’s macroeconomic fundamentals.
How the value of a company’s stock price has varied when it
announced the value of its dividend payment.
The effect on a country’s currency of an increase in its interest
rate.

Cross-sectional Data
Cross-sectional data are data on one or more variables collected at a

single point in time, e.g.
A poll of usage of internet stock broking services
Cross-section of stock returns on the New York Stock Exchange
A sample of bond credit ratings for UK banks
Examples of Problems that Could be Tackled Using a Cross-Sectional

Regression:
The relationship between company size and the return to investing
in its shares.
The relationship between a country’s GDP level and the probability
that the government will default on its sovereign debt.

Notation
A usual notation includes the following symbols:

denote each observation by the letter t and the total number of
observations by T for time series data, t = 1, 2, ..., T
denote each unit by the letter i and the total number of
observations by N for cross-sectional data, i = 1, 2, ..., N.
Example:
The stock price of asset i across time: pit .

Prices of Financial Assets
# Make a plot
plot(d, p, main="SPY", type="l", xlab="Time", ylab="Prices ($)")
SPY
250
200
Prices ($)
150
100
50
2005 2010 2015
Time
Figure:
Returns for Financial Assets
It is preferable not to work directly with asset prices (why???), so we

usually convert the raw prices into a series of returns. There are two
ways to do this. Let Rt denote the return at time t, pt denote the asset
price at time t.
Simple (Percentage) returns → modelling & calculation of profits
pt − pt−1
Rt = (×100%)
pt−1
Log returns → modelling only

pt
Rt = log (×100%)
pt − 1
We also ignore any dividend payments, or alternatively assume that the

price series have been already adjusted to account for them.

Log Returns
The returns are also known as log price relatives. There is a number of
reasons for this:
They have the nice property that they can be interpreted as
continuously compounded returns.
Can add them up, e.g. if we want a weekly return and we have
calculated daily log returns:
r1 = log p1 /p0 = log p1 − log p0
log p5 − log p0 = log p5 /p0

Disadvantage of Log Returns
The simple return on a portfolio of assets is a weighted average of the

simple returns on the individual assets:
N
Rpt = ∑ wip Rit
i=1
But this does not work for the continuously compounded returns.

Steps involved in the formulation of econometric
models
1 Economic or Financial Theory (Previous Studies)

2 Collection of Data
3 Formulation of a (testable) Theoretical Model
4 Model Estimation. Is the Model Statistically Adequate?
No:
Reformulate Model (go back to Step 3)
Yes:
Interpret Model
Use for Analysis

Regression
Regression is probably the single most important tool at the

econometrician’s disposal.
But what is regression analysis?
It is concerned with describing and evaluating the relationship
between a given variable (usually called the dependent variable)
and one or more other variables (usually known as the
independent variable(s)).

Notation
Denote the dependent variable by y
and the independent variable(s) by x1 , x2 , . . . , xk where there are k

independent variables.
Note that there can be many x variables
but we will limit ourselves to the case where there is only one x
variable to start with.
In our set-up, there is only one y variable.

Regression vs Correlation
If we say y and x are correlated, it means they are linearly

associated and that we are treating y and x in a completely
symmetrical way.
It does not imply that changes in x cause changes in y.
In regression, we treat the dependent variable (y) and the independent

variable(s) (xs) very differently.
The y variable is assumed to be random or “stochastic” in some
way, i.e. to have a probability distribution.
The x variables are, however, assumed to have fixed
(“non-stochastic”) values in repeated samples.

Simple Regression
For simplicity, say k = 1.
This is the situation where y depends on only one x variable.
Examples of the kind of relationship that may be of interest

include:
How asset returns vary with their level of market risk.
Measuring the long-term relationship between stock prices and
dividends.
Constructing an optimal hedge ratio.

Simple Regression I
Suppose that we have the following data on the excess returns on a

fund manager’s portfolio (“fund XXX‘”) together with the excess returns
on a market index:
Year Excess return Excess Return on market index

t rXXX,t − rft rmt − rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

Simple Regression II
We have some intuition that the beta on this fund is positive.
We therefore want to find whether there appears to be a

relationship between x and y given the data that we have.
The first stage would be to form a scatter plot of the two variables.

Return Plot
# specify the axes values if you want using ylim and xlim
plot(mi, mp, main="Scatterplot", xlab="Market Index", ylab="Fund XXX",
type="p", pch=19, col="blue", cex=2, xlim=c(0,25), ylim=c(0,45))
Scatterplot
40
●
30
Fund XXX
●
20
● ●
●
10
0
0 5 10 15 20 25
Market Index
Katerina Chrysikou (KBS) Figure: Week 1 21 / 56

Finding a Line of Best Fit (Regression Line)
We can use the general equation for a straight line,
y = a + bx
to get the line that best “fits” the data.
However, this equation (y = a + bx) is completely deterministic. Is

this realistic?
No. So what we do is to add a random disturbance term, u into the

equation:
yt = α + βxt + ut
where t = 1, 2, 3, 4, 5

Why a Disturbance Term?
The disturbance term can capture a number of features:

We always leave out some determinants of yt
There may be errors in the measurement of yt that cannot be
modelled.
Random outside influences on yt which we cannot model
So how do we determine what α and β are?

Choose α and β so that the (vertical) distances from the data
points to the fitted lines are minimised (so that the line fits the
data as closely as possible):

Determining the Regression Coefficient
Scatterplot
40
●
30
Fund XXX
●
20
● ●
●
10
0
0 5 10 15 20 25
Market Index
Figure:
Ordinary Least Squares
The most common method used to fit a line to the data is known as

Ordinary Least Squares (OLS).
What we actually do is take each distance and square it and minimise

the total sum of the squares (hence least squares).
Tightening up the notation, let:

yt denote the actual data point
ŷt denote the fitted value from the regression line
ût denote the residual yt − ŷt

How OLS Works
So
min u21 + u22 + u23 + u24 + u25

5
= min ∑ u2t .
t=1
This is known as the residual sum of squares.

But what was ut ? It was the difference between the actual point and
the line, yt − α − βxt .
So minimising ∑(yt − α − βxt )2 is equivalent to minimising ∑ u2t with
respect to α and β .

Deriving the OLS Estimator I
Let L = ∑(yt − α − βxt )2 . We want to minimize L w.r.t. α and β. So the
FOC are:
∂L
= −2 ∑(yt − α − βxt ) = 0, (1)
∂α
∂L
= −2 ∑ xt (yt − α − βxt ) = 0. (2)
∂β
From (1)
∑ yt − T α̂ − β̂ ∑ xt = 0.
But ∑ yt = Ty, ∑ xt = Tx. So we can write: y − α̂ − β̂x = 0, and obtain
α̂ = y − β̂x. (3)

Deriving the OLS Estimator II
We can substitute (3) into (2) and get
∑ xt (yt − y + β̂x − β̂xt ) = 0

∑ xt yt − y ∑ xt + β̂x ∑ xt − β̂ ∑ xt2 = 0
∑ xt yt − Tyx + β̂Tx2 − β̂ ∑ xt2 = 0
from which one easily gets
∑ xt yt − Tyx
β̂ = 2
.
∑ xt2 − Tx

What Do we use α̂ and β̂ for?
In the CAPM example used above, plugging the 5 observations in to

make up the formulae given above would lead to the estimates
α̂ = −1.74 and β̂ = 1.64.
# Do the regression and save the regression output
out.reg <- lm(mp~mi)
> lm(mp~mi)
Call:
lm(formula = mp ~ mi)
Coefficients:
(Intercept) mi
-1.737 1.642

We could write the fitted line as:
ŷt = −1.74 + 1.64xt .
Question: If an analyst tells you that she expects the market to yield a
return 20% higher than the risk-free rate next year, what would you
expect the return on fund XXX to be?
Solution: We can say that the expected value of
y = −1.74 + 1.64 ∗ (value of) x, so plug x = 20 into the equation to get
the expected value for y:
ŷt = −1.74 + 1.64 × 20 = 31.06.

Linearity I
In order to use OLS, we need a model which is linear in the parameters.

It does not necessarily have to be linear in the variables. Linear in the
parameters means that the parameters are not multiplied together,
divided, squared or cubed etc.
Some models can be transformed to linear ones by a suitable

substitution or manipulation, e.g. the exponential regression model
Yt = eα Xt eut .
β
Then let yt = log Yt and xt = log Xt and get
yt = α + βxt + ut .

Linearity II
Similarly, if theory suggests that y and x should be inversely related:
β
yt = α + + ut
xt
then the regression can be estimated using OLS by substituting
1
zt = .
xt
But some models are intrinsically non-linear, e.g.
β
yt = α + x t + u t .

The assumptions underlying the CLRM
We usually make the following set of assumptions:

1 E(ut ) = 0
2 var(ut ) = σ2 < ∞
3 cov(ut , us ) = 0
4 cov(xt , ut ) = 0
A fifth assumption is required if we want to make exact inference

about the population parameters (the actual α and β) from the sample
parameters (α̂ and β̂). Additional assumption:
ut is normally distributed.

Properties of the OLS Estimator
If Assumptions 1. through 4. hold, then the estimators by OLS are

known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?

Best: Estimator with the minimum variance within the class of
linear unbiased estimators (Gauss-Markov theorem).
Linear: It is a linear estimator.
Unbiased: Its expectation is the true value of the parameter.

Consistency/Unbiasedness/Efficiency
Consistency: means that estimates will converge to the true value as

the sample size increases. Formally

lim Pr β̂ − β > δ = 0, ∀ δ > 0.
T →∞
Unbiasedness: implies that

E β̂ = β.
Efficiency: relates to the variance of the estimator.

Standard Errors
We need some measure of the reliability or precision of the estimators

(α̂ and β̂ ).
The precision of the estimate is given by its standard error.
Given Assumptions 1 - 4 above, then the standard errors can be shown
to be given by
s s
∑ xt2 ∑ xt2
SE(α̂) = s = s ,
T ∑(xt − x)2 T ∑ xt2 − T 2 x2
s s
1 1
SE( β̂) = s =s ,
∑ ( xt − x ) 2 ∑ xt2 − Tx2
where s is the estimated standard errors of residuals.

Estimating the Variance of the Disturbance Term
The variance of the random variable ut is given by:
var(ut ) = E(ut − E(ut ))2 = E(u2t ).
Since ut are not observable, we use:

1
s2 =
T ∑ û2t
But this estimator is a biased estimator of σ2 . An unbiased estimator
is given by:
1
T−2 ∑ t
s2 = û2 .

Use the previous example and calculate!
# Break into pieces so you can understand the algebra

sx2 <- sum(x^2)
SEa.denom1 <- t*sx2
SEa.denom2 <- (t^2) * (mean(x)^2)
SEa <- s * sqrt( sx2 / (SEa.denom1 - SEa.denom2))
SEb <- s * sqrt( 1 / (sx2 - t*(mean(x)^2)))
> SEa
[1] 4.113994
> SEb
[1] 0.2647783

Also check the regression output!
# Check the standard error

> summary(out.reg)
Call:
lm(formula = mp ~ mi)
Residuals:
1 2 3 4 5
-2.955 2.648 3.209 -1.645 -1.257
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.7366 4.1140 -0.422 0.70135
mi 1.6417 0.2648 6.200 0.00845 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.179 on 3 degrees of freedom

Multiple R-squared: 0.9276,Adjusted R-squared: 0.9035
F-statistic: 38.45 on 1 and 3 DF, p-value: 0.008452

Some Comments on the Standard Error Estimators I
1 Both SE(α̂) and SE( β̂) depend on s2 . The greater the variance s2 ,
then the more dispersed the errors are about their mean value and
therefore the more dispersed y will be about its mean value.
2 The sum of the squares of x about their mean appears in both

formulae. The larger the sum of squares, the smaller the
coefficient variances.

Small ∑(xt − x)2
# Check how small dispersion looks like
# by generating 20 random numbers (all very close to your mean value)
y <- rnorm(20, mean=5, sd=0.5)
x <- rnorm(20, mean=5, sd=0.5)
plot(x, y, type="p", pch=19, col="blue", cex=1.2, xlim=c(0,10), ylim=c(0,10))
abline(lm(y~x), col="red", lwd=2)
10
8
●
6
● ●●●
●●
●●
y
● ●
● ● ●●
●
●●
●
4
●
2
0
0 2 4 6 8 10
Figure:
Large ∑(xt − x)2
y <- rnorm(20, mean=5, sd=0.5)
x <- rnorm(20, mean=5, sd=2)
plot(x, y, type="p", pch=19, col="blue", cex=1.2, xlim=c(0,10), ylim=c(0,10))
abline(lm(y~x), col="red", lwd=2)
10
8
●
6
● ●
● ●● ● ●
● ●
● ● ●
y
●
● ●
●
● ●●
4
2
0
0 2 4 6 8 10
Figure:
Some Comments on the Standard Error Estimators II
The larger the sample size, T, the smaller will be the coefficient
variances.
T appears explicitly in SE(α̂) and implicitly in SE( β̂).
T appears implicitly since the sum ∑(xt − x)2 is from t = 1 to T.

An Introduction to Statistical Inference
We want to make inference about the likely population values from the
regression parameters.
From regression results we have a single (point) estimate of the

unknown population parameter. How “reliable” is this estimate?
The reliability of the point estimate is measured by the coefficient’s

standard error.

Hypothesis Testing: Some Concepts
We can use the information in the sample to make inferences about

the population.
We will always have two hypotheses that go together, the null

hypothesis (denoted H0 ) and the alternative hypothesis (denoted H1 ).
The null hypothesis is the statement or the statistical hypothesis that
is actually being tested. The alternative hypothesis represents the
remaining outcomes of interest.
For example, suppose given the regression results above, we are

interested in the hypothesis that the true value of β is in fact 0.5. We
would use the notation
H0 : β = 0.5
H1 : β 6= 0.5
This would be known as a two sided test.
The Probability Distribution of the Least Squares
Estimators I
We assume that ut ∼ N(0, σ2 )
The least squares estimators are linear combinations of the random

variables i.e. β̂ = ∑ wt yt
Therefore, the weighted sum of normal random variables is also

normally distributed, so
α̂ ∼ N(α, var(α̂))
β̂ ∼ N( β, var( β̂))

Estimators II
What if the errors are not normally distributed?
Will the parameter estimates still be normally distributed?
Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.
Standard normal variates can be constructed from α̂ and β̂:
α̂ − α β̂ − β
p ∼ N(0, 1) and q ∼ N(0, 1)
var(α̂) var( β̂)

Estimators III
But var(α) and var( β) are unknown.
So we have
α̂ − α β̂ − β
∼ tT −2 and ∼ tT −2 .
SE(α̂) SE( β̂)

Testing Hypotheses: The Test of Significance
Approach I
Assume the regression equation is given by
yt = α + βxt + ut , t = 1, 2, . . . , T.
The steps involved in doing a test of significance are:

1 Calculate α̂, β̂ and SE(α̂), SE( β̂) in the usual way.
2 Calculate the test statistic. This is given by the formula
β̂ − β∗
,
SE( β̂)
where β∗ is the value of β under the null hypothesis.

Approach II
We need some tabulated distribution with which to compare the

estimated test statistics.
Test statistics derived in this way can be shown to follow a

t-distribution with T − 2 degrees of freedom.
As the number of degrees of freedom increases, we need to be less

cautious in our approach since we can be more sure that our results
are robust.

Approach III
We need to choose a “significance level”, often denoted α (which is

different from the constant!!).
This is also sometimes called the size of the test and it determines the
region where we will reject or not reject the null hypothesis that we are
testing.
It is conventional to use a significance level of 5%. Intuitive explanation

is that we would only expect a result as extreme as this or more
extreme 5% of the time as a consequence of chance alone.
Conventional to use a 5% size of test, but 10% and 1% are also

commonly used.

Approach IV
Use the t-tables to obtain a critical value or values with which to

compare the test statistic.
Finally, perform the test. If the test statistic lies in the rejection region
then reject the null hypothesis (H0 ), else do not reject H0 .

A Special Type of Hypothesis Test: The t-ratio I
Recall that the formula for a test of significance approach to

hypothesis testing using a t-test was
β̂ − β∗
,
SE( β̂)
If the test is
H0 : β = 0,
H1 : β 6 = 0
i.e. a test that the population coefficient is zero against a two-sided
alternative, this is known as a t-ratio test.

A Special Type of Hypothesis Test: The t-ratio II
Since β∗ = 0, the test statistic is
β̂
.
SE( β̂)
t-ratios are conventionally reported by all the econometric packages.

Calculate the t-ratios
# Calculate the t-ratios for true values equal to 0

tra <- (out.reg$coefficients[1] - 0)/SEa
trb <- (out.reg$coefficients[2] - 0)/SEb
> tra
-0.4221322
> trb
6.200453
# compare with the output

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.7366 4.1140 -0.422 0.70135
mi 1.6417 0.2648 6.200 0.00845 **

References
Papers:
Jensen, M. (1968). The performance of mutual funds in the period
1945-1964. Journal of Finance, 23, 389-416.

Week 1

Uploaded by

Copyright:

Available Formats

You might also like

Week 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1

Uploaded by

Copyright:

Available Formats

Week 1: Linear Models I

Introduction to Analytics in Finance

Katerina Chrysikou (KBS) Week 1 1 / 56

Katerina Chrysikou (KBS) Week 1 2 / 56

Testing whether ﬁnancial markets are weak-form informationally

Katerina Chrysikou (KBS) Week 1 3 / 56

Determining the optimal hedge ratio for a spot position in oil.

Katerina Chrysikou (KBS) Week 1 4 / 56

Frequency and quantity of data: Stock market prices are measured

Katerina Chrysikou (KBS) Week 1 5 / 56

Examples of time series data:

Katerina Chrysikou (KBS) Week 1 6 / 56

Examples of Problems that Could be Tackled Using a Time Series

Katerina Chrysikou (KBS) Week 1 7 / 56

Cross-sectional data are data on one or more variables collected at a

Examples of Problems that Could be Tackled Using a Cross-Sectional

Katerina Chrysikou (KBS) Week 1 8 / 56

A usual notation includes the following symbols:

Katerina Chrysikou (KBS) Week 1 9 / 56

2005 2010 2015

It is preferable not to work directly with asset prices (why???), so we

Log returns → modelling only

We also ignore any dividend payments, or alternatively assume that the

Katerina Chrysikou (KBS) Week 1 11 / 56

Katerina Chrysikou (KBS) Week 1 12 / 56

The simple return on a portfolio of assets is a weighted average of the

Katerina Chrysikou (KBS) Week 1 13 / 56

1 Economic or Financial Theory (Previous Studies)

Katerina Chrysikou (KBS) Week 1 14 / 56

Regression is probably the single most important tool at the

Katerina Chrysikou (KBS) Week 1 15 / 56

Denote the dependent variable by y

and the independent variable(s) by x1 , x2 , . . . , xk where there are k

Note that there can be many x variables

In our set-up, there is only one y variable.

Katerina Chrysikou (KBS) Week 1 16 / 56

If we say y and x are correlated, it means they are linearly

In regression, we treat the dependent variable (y) and the independent

Katerina Chrysikou (KBS) Week 1 17 / 56

For simplicity, say k = 1.

This is the situation where y depends on only one x variable.

Examples of the kind of relationship that may be of interest

Katerina Chrysikou (KBS) Week 1 18 / 56

Suppose that we have the following data on the excess returns on a

Year Excess return Excess Return on market index

Katerina Chrysikou (KBS) Week 1 19 / 56

We have some intuition that the beta on this fund is positive.

We therefore want to ﬁnd whether there appears to be a

Katerina Chrysikou (KBS) Week 1 20 / 56

Katerina Chrysikou (KBS) Figure: Week 1 21 / 56

We can use the general equation for a straight line,

to get the line that best “ﬁts” the data.

However, this equation (y = a + bx) is completely deterministic. Is

No. So what we do is to add a random disturbance term, u into the

Katerina Chrysikou (KBS) Week 1 22 / 56

The disturbance term can capture a number of features:

So how do we determine what α and β are?

Katerina Chrysikou (KBS) Week 1 23 / 56