Chapter 2 FECON

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Chapter 2: Simple Linear

Regression
Outline
✓ Concept of regression function
✓ Simple Linear Regression
✓ Assumptions of CLRM
✓ Method of Estimations
✓ Properties of LS estimates
✓ Goodness of fit
✓ Confidence Intervals and Hypothesis Testing
Compiled By: Getaneh Y.(Assi. Prof.)
Econometrics for Finance Teaching Material
Introduction
• Theories in finance, business and economics are mainly concerned
with the relationships among various variables.
• When relationships, phrased in mathematical terms, can predict the
effect of one variable on another.
• The functional relationships of these variables define the dependence
of one variable upon the other variable (s) in the specific form.
• The specific functional forms may be linear, quadratic, logarithmic,
exponential, hyperbolic, or any other form.
• In this chapter we shall consider a simple linear regression model, i.e.
a relationship between two variables related in a linear form.
• We shall first discuss two important forms of relation: stochastic and
non-stochastic, among which we shall be using the former in
econometric analysis. Econometrics for Finance Teaching Material
Stochastic and Non-stochastic Relationships
• A relationship between X and Y, characterized as Y = f(X) is said to be
deterministic or non-stochastic if for each value of the independent
variable (X) there is one and only one corresponding value of
dependent variable (Y).
• On the other hand, a relationship between X and Y is said to be
stochastic if for a particular value of X there is a whole probabilistic
distribution of values of Y. In such a case, for any given value of X,
the dependent variable Y assumes some specific value only with some
probability.

Econometrics for Finance Teaching Material


Stochastic and Non-stochastic Relationships
• Let us consider the CAPM which dictates the excess return on asset
depends on the excess return of market index.
(Rx=Rf-β(Rm-Rf)➔ Rx-Rf=β(Rm-Rf), let Rx-Rf=y & Rm-Rf= x
• The function being linear, the relationship can be put as:
𝑦 = 𝑎 + β𝑥
• The above relationship between y and x is such that for a particular
value of x, there is only one corresponding value of y such relationship
is a deterministic(non-stochastic).
• This implies that all the variation in Y is due solely to changes in X,
and that there are no other factors affecting the dependent variable.

Econometrics for Finance Teaching Material


Stochastic and Non-stochastic Relationships
• However, if we gather observations on the y(excess asset return)
actually supplied in the market at various excess market return and we
plot them on a diagram we see that they do not fall on a straight line.
Year, t 1 2 3 4 5
❑ The derivation of the observation from the line may be
Excess asset return 17.8 39.0 12.8 24.2 17.2
attributed to several factors.
Excess market return 13.7 23.2 6.9 16.8 12.3
45
A. Omission of variables from the function
EXCESS ASSET RETURN

40 y = 1.6417x - 1.7366
35
B. Random behavior of human beings
30
25
C. Imperfect specification of the mathematical form of
20
15
the model
10
5 D. Error of aggregation
0
0 5 10 15 20 25 E. Error of measurement
EXCESS MARKET RETURN

Econometrics for Finance Teaching Material


Stochastic and Non-stochastic Relationships
• In order to take into account the above sources of errors we introduce
in econometric functions a random variable which is usually denoted
by the letter ‘u’ or ‘ε’ and is called error term or random disturbance or
stochastic term of the function, so called be cause u is supposed to
‘disturb’ the exact linear relationship which is assumed to exist
between X and Y.
• By introducing this random variable in the function the model is
rendered stochastic of the form:
𝑦 = 𝑎 + β𝑥+ε
• Thus a stochastic model is a model in which the dependent variable is
not only determined by the explanatory variable(s) included in the
model but also by others which are not included in the model.

Econometrics for Finance Teaching Material


Regression Overview
What is regression?
• It is concerned with describing and evaluating the relationship between a given
variable (usually called the dependent variable) and one or more other variables
(usually known as the independent variable(s)).
• A variable is any characteristic or attribute which is subject to change and can
have more than one value. E.g. Age is a variable & can take different values
like 18, 25,52 etc.
• Denote the dependent variable by y and the independent variable(s) by x1, x2, ...
, xk where there are k independent variables.
• Some alternative names for the y and x variables:
Dependent Effect Explained Controlled
Y Regressand Response Endogenous Predictand
Variable Variable Variable Variable
Independent Causal Explanatory Control
X Regressors Stimulus Exogenous Predictor
Variables Variables Variable Variable
• Independent variable is variable presumed to influence other variable.
• Dependent variable is affected by the independent variable.
Econometrics for Finance Teaching Material
Population vs. Sample Regression
❖Population regression function is the hypothetical model present in
the population and
❖Sample regression function is the model calculated with the sample
extracted from the population.
❖Primary objective in regression analysis is to estimate the population
regression function on the bases of sample regression function.
Yi =  + X i ➢ Where α & β are population regression coefficients, α is
intercept and β is slope coefficient

Yi = ˆ + ˆX i ➢ Where ̂ & ̂ are population regression coefficients, ̂ is


intercept and ̂ is slope coefficient
Econometrics for Finance Teaching Material
Regression is different from correlation
➢Correlations: measure the strength or degree of linear association
between two variables.
• E.g. Relation between scores on statistics and mathematics examinations
➢In regression we are interested to estimate or predict the average value
of one variable on the basis the other variables.
➢In regression, we treat the dependent variable (y) and the independent
variable(s) (x’s) very differently.
✓ The y variable is assumed to be random or “stochastic” in some
way, i.e. to have a probability distribution.
✓ The x variables are, however, assumed to have fixed (“non-
stochastic”) values in repeated samples.
➢Types of regression: simple regression vs. multiple regression.
Econometrics for Finance Teaching Material
Simple Linear Regression model.
• The stochastic relationship (𝑦 = 𝑎 + β𝑥+ε) with one explanatory
variable is called simple linear regression model.
• The true relationship which connects the variables involved is split
into two parts:
✓a part represented by a line and a part represented by the random term ‘ε’.
Yi

=  + xi +

 i
the dependent var iable the regression line random var iable

where i = 1,2,3,4,5
• The first component in the bracket is the part of Y explained by the
changes in X and the second is the part of Y not explained by X, that is
to say the change in Y is due to the random influence of εi

Econometrics for Finance Teaching Material


Simple Linear Regression model…
Graphically..
POSSIBLE REGRESSION LINES IN SIMPLE LINEAR REGRESSION

11
Simple Linear Regression model…
❑ The scatter of observations represents the true relationship
between Y and X.

❑ The line represents the exact part of the relationship and the
deviation of the observation from the line represents the
random component of the relationship.

Compiled by Adino 12
Simple Linear Regression model…

❑ The first component in the bracket is the part of Y explained


by the changes in X and the second is the part of Y not
explained by X, that is to say the change in Y is due to the
random influence term.

Compiled by Adino 13
Assumptions of the Classical Linear Stochastic
Regression Model
❑ The classical economists made important assumption in their analysis of regression

A. The model is linear in parameter

B. Ui(εi) is a random real variable

C. The mean value of the random variable(U) in any particular period is zero

D. The variance of the random variable(U) is constant in each period (The


assumption of homoscedasticity)

E. The random variable (U) has a normal distribution

Econometrics for Finance Teaching Material


Assumptions of the Classical Linear Stochastic
Regression Model
F. The random terms of different observations are independent.
(The assumption of no autocorrelation)

G. The set of fixed values in the hypothetical process of repeated


sampling which underlies the linear regression model.

H. The random variable (U) is independent of the explanatory


variables.

I. The explanatory variables are measured without error


Econometrics for Finance Teaching Material
Assumptions of the CLRM…
1. The model is linear in parameters.
• The CLRM assumed that the model should be linear in the parameters
regardless of whether the explanatory and the dependent variables are
linear or not.
• This is because if the parameters are non-linear it is difficult to estimate
them since their value is not known.
• Example 1. Y =  + x + u is linear in both parameters and the variables, so
it satisfies the assumption
• 2. ln Y =  +  ln x + u is linear only in the parameters. Since the classicals
worry on the parameters, the model satisfies the assumption.

Econometrics for Finance Teaching Material


Assumptions of the CLRM…
B.  i is a random real variable
• This means that the value which u may assume in any one period
depends on chance;
• It may be positive, negative or zero.
• Every value has a certain probability of being assumed by u in any
particular instance.

Econometrics for Finance Teaching Material


Assumptions of the CLRM…
C. The mean value of the random variable(U) in any particular
period is zero
• This means that for each value of x, the random variable(u) may
assume various values, some greater than zero and some smaller than
zero,
• But if we considered all the possible and negative values of u, for any
given value of X, they would have on average value equal to zero.
• In other words the positive and negative values of u cancel each other.

• Mathematically, E (U i ) = 0

Econometrics for Finance Teaching Material


Assumptions of the CLRM…
D) The variance of the random variable(U) is constant in
each period (The assumption of homoscedasticity)

➢ For all values of X, the u’s will show the same


dispersion around their mean.
➢ This assumption is denoted by the fact that the
values that u can assume lie with in the same
limits, irrespective of the value of X.
➢ For X1 , u can assume any value with in the
range AB;
➢ For X2 , u can assume any value with in the
range CD which is equal to AB and so on.

Mathematically; Var (U i ) = E[U i − E (U i )]2 = E (U i ) 2 =  2 (Since E (U i ) = 0 ).


Econometrics for Finance Teaching Material
Assumptions of the CLRM…
E) The random variable (U) has a normal distribution
• This means the values of u (for each x) have a bell shaped symmetrical
distribution about their zero mean and constant variance , i.e U.  N (0, )i
2

F) The random terms of different observations are independent.


(The assumption of no autocorrelation)
• This means the value which the random term assumed in one period
does not depend on the value which it assumed in any other period.
• Algebraically, Cov(u u ) = [(u − (u )][u − (u )]
i j i i j j

= E (u i u j ) = 0

Econometrics for Finance Teaching Material


Assumptions of the CLRM…
G. The are a set of fixed values in the hypothetical process of
repeated sampling which underlies the linear regression model.
• This means that, in taking large number of samples on Y and X, the
Xi values are the same in all samples, but the ui values do differ from
sample to sample, and so of course do the values of yi.

H. The random variable (U) is independent of the explanatory


variables.
• This means there is no correlation between the random variable and
the explanatory variable.
• If two variables are unrelated their covariance is zero.
• Hence Cov( X i ,U i ) = 0
Econometrics for Finance Teaching Material
Methods of estimation
❑Specifying the model and stating its underlying assumptions are the first stage
of any econometric application.

❑The next step is the estimation of the numerical values of the parameters of
economic relationships.

❑The parameters of the simple linear regression model can be estimated by


various methods. Three of the most commonly used methods are:
1. Ordinary least square method (OLS)
2. Maximum likelihood method (MLM)
3. Method of moments (MM)
Econometrics for Finance Teaching Material
The ordinary least square (OLS) method
• Is the most common method used to fit a line to a data.
• What we actually do is take each distance and square it (i.e. take the area of each
of the squares in the diagram) and minimise the total sum of the squares (hence
least squares). y

• Tightening up the notation, let


✓ Y & X denote the actual data point.
✓ Yi =  + X i + U i the true relationship between Y and X.
✓ α & β denote the population parameter y

✓ Yi = ˆ + ˆX i + ei is the estimated relationship between Y and X


x

ෝ & 𝛽መ denote the estimators α & β determined from sample.


yi

✓ α û i

✓ Yi = ˆ + ˆX i denote the fitted value from the regression line ŷi

✓ ei = Yi − (ˆ + ˆX i ) denote the residual.


xi x
Actual and Fitted Value
Econometrics for Finance Teaching Material
The OLS method…Determining α ෡
ෝ&𝜷
• Estimation of α and β by least square method (OLS) or classical least
square (CLS) involves finding values for the estimates which will
minimize the sum of square of the squared residuals ( e ).
2
i

• From the estimated relationship, Y = ˆ + ˆX + e we obtain:i i i

• To find the values of α መ


ෝ & 𝛽that minimize this sum, we have to
partially differentiate  e with respect to α
2
i ෝ & 𝛽መ and set the partial
derivatives equal to zero.
  ei2
1. = −2 (Yi − ˆ − ˆX i ) = 0.......................................................(eq1)
ˆ
• Rearranging (eq.1) expression we will get:  Y i = n + ˆX i ............(eq 2)

• If we divide (eq2) by ‘n’ and rearrange, we get


ˆ = Y − ˆX .......................................................................(eq3)
Econometrics for Finance Teaching Material
The OLS method…Determining α ෡
ෝ&𝜷
• Know partially differentiate  e with respect to 𝛽መ and set the partial
2
i

derivatives equal to zero.


  ei2
2. = −2 X i (Yi − ˆ − ˆX ) = 0..................................................(eq 4)
ˆ


• Rearranging equation 4, provides  Yi X i = ˆX i + ˆX i2 ...................eq5


• Substituting the values of α
ෝ from (eq3) to (eq5), we get:

XY − nXY
ˆ = ..........eq6
X i − nX 2
2

Econometrics for Finance Teaching Material


The OLS method…Determining α ෡
ෝ&𝜷
XY − nXY
• Equation 6 ( ˆ =
X i2 − nX 2
) can be rewritten as follows

• The numerator XY − n X Y = ( X − X )(Y − Y )


= XY − n X Y

• The denominator X 2 − nX 2 = ( X − X ) 2
• Substituting the numerator and denominator equivalents, we get
ˆ ( X − X )(Y − Y )
=
( X − X ) 2
• Thus using OLS method α ෡ are: ˆ = Y − ˆX
ෝ&𝜷 ( X − X )(Y − Y )
and ˆ =
( X − X ) 2
Econometrics for Finance Teaching Material
The OLS method…Determining α ෡
ෝ&𝜷
Example
• Determine the coefficients of the estimated line that best fits the
following data

Remember to use
ˆ = y − ˆx
σ(𝑥𝑡 − 𝑥)(𝑦
lj 𝑡 − 𝑦)lj 𝐶𝑜𝑣(𝑥, 𝑦)

𝛽= = and
σ(𝑥𝑡 − 𝑥)lj 2 𝜎 𝑥2
Econometrics for Finance Teaching Material
Example
• Solution: use the following formula or
Excel: Slope or Intercept functions:
x= σ 𝑋𝑖
𝑛
=
72.9
5
= 𝟏𝟒. 𝟓𝟖

σ 𝑦𝑖 111
= = 𝟐𝟐. 𝟐
y 𝑛
=
5

ˆ =  ( x − x )( y − y )
t t

 (x − x)
t
2

45
EXCESS ASSET RETURN

236.72
ˆ = = 1.64
40 y = 1.6417x - 1.7366
35
30 144.188
ˆ = y − ˆx
25
20

= 22.2 − 1.64(14.58) = −1.74


15
10
5
Thus:
yˆ t = −1.74 + 1.64 x t
0
0 5 10 15 20 25
EXCESS MARKET RETURN
What do We Use $ and $ For?
• In the CAPM example used above, plugging the 5 observations in to make up the
formulae given above would lead to the estimates
$ = -1.74 and $ = 1.64. We would write the fitted line as:

yˆ t = −1.74 + 1.64 x t
• Question: If an analyst tells you that she expects the market to yield a return 20%
higher than the risk-free rate next year, what would you expect the return on fund
XXX to be?
• Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”, so
plug x = 20 into the equation to get the expected value for y:

yˆ i = −1.74 + 1.64 20 = 31.06


Properties of the OLS Estimator
• If assumptions 1. through 4. hold, then the estimators $ and $
determined by OLS are known as Best Linear Unbiased Estimators
(BLUE).
• What does the acronym stand for?
✓“Estimator” => $ is an estimator of the true value of .
✓“Linear” => $ is a linear estimator
✓“Unbiased” => On average, the actual value of the $ and $ ’s
will be equal to the true values. i.e. E($)= and E( $ )=
✓“Best” => means that the OLS estimator has minimum
variance among the class of linear unbiased estimators.
NB. Estimators arethe formulae used to calculate the coefficients, estimates are the
actual numerical values for the coefficients.
Linearity
• In order to use OLS, we need a model which is linear in the parameters ( and  ).
It does not necessarily have to be linear in the variables (y and x).

• Linear in the parameters means that the parameters are not multiplied together,
divided, squared or cubed etc.

• Some models can be transformed to linear ones by a suitable substitution or


manipulation, e.g. the exponential regression model

Yt = e X t eut ln Yt =  +  ln X t + ut
• Then let yt=ln Yt and xt=ln Xt
yt =  + xt + ut
Linear and Non-linear Models

• Similarly, if theory suggests that y and x should be inversely related:



yt =  + + ut
xt
then the regression can be estimated using OLS by substituting
1
zt =
xt
• But some models are intrinsically non-linear, e.g.

yt =  + xt + ut
Unbiased
• Unbiasedness is one of the most desirable properties of any
estimator.
• It is the basic minimum requirement to be satisfied by any
estimator.
• The unbiasedness property of OLS method says that when you
take out samples of 50 repeatedly, then after some repeated
attempts, you would find that the average of all
the βo and βi from the samples will equal to the actual (or the
population) values of βo and βi.
• However, it is not sufficient for the reason that most times in real-
life applications, you will not have the luxury of taking out
repeated samples. In fact, only one sample will be available in
most cases. Econometrics for Finance Teaching Material

ෝ and 𝜷
Minimum variance of 𝜶
• The OLS estimator with minimum variance among the class of linear
unbiased estimators is a Best Estimator.
• Generally
1. If the estimator is unbiased but doesn’t have the least
variance – it’s not the best!
2. If the estimator has the least variance but is biased – it’s
again not the best!
3. If the estimator is both unbiased and has the least variance
– it’s the best estimator.
• An estimator which is unbiased and has the least variance is said to be
efficient.
Econometrics for Finance Teaching Material

ෝ&𝜷
Estimating the Variance of the error term, 𝜶
▪ The variance of the error term is said to be minimum when its standard
error is lower than half of the regression coefficient.
1
▪ i.e. when SE(𝑆𝐸(𝛽)< 𝛽መ

2
▪ How do we compute the variance/standard error of parameters?
• In your basis statistics courses we remember as how variance & standard deviation
are computed
σ(xi−തx)2
Variance (δ2) = N
σ(xi−തx)2
Standard Deviation (δ) = δ2 = N

• What do each measure?


• Now let us see as how variability of the error term & parameters are computed.
Econometrics for Finance Teaching Material
…Variance of ei, 𝜶 ෡
ෝ and 𝜷
• The variance of the random variable ut is given by
Var(ut) = E[(ut)-E(ut)]2 where E(ut) = 0
• Which reduces to
Var(ut) = E(ut2)
1
• We could estimate this using the average of ut2: 𝑠2 = σ ut2
𝑇
• Unfortunately this is not workable since ut is not observable. We can use the sample
counterpart to ut, which is û t or et:
σ 𝑒𝑡2
𝑠2 =
𝑛
➢ But this estimator is a biased estimator of 2.
σ 𝑒𝑡2
2
𝑠 =
𝑛−2
➢ This estimator is unbiased estimator of 2.
σ 𝑒𝑡2
❑An unbiased estimator of  is given by 𝑠=
𝑇−2
…Variance of ei, 𝜶 ෡
ෝ and 𝜷
• The variance of the parameters is • The SE of the
given by parameters is given by
1
𝑉𝑎𝑟 𝛽መ = 𝑉𝑎𝑟 𝑒𝑡 ∗ 𝑆𝐸 𝛽መ = መ
𝑉𝑎𝑟(𝛽)
σ(𝑥 − 𝑥)ҧ 2

σ 𝑒𝑡 2
=
(𝑛 − 2) σ(𝑥 − 𝑥)ҧ 2
2 𝑆𝐸 𝛼ො = 𝑉𝑎𝑟(𝛼)

σ𝑥
𝑉𝑎𝑟 𝛼ො = 𝑉𝑎𝑟 𝑒𝑡 ∗
𝑛 σ(𝑥 − 𝑥)ҧ 2

σ 𝑒𝑡 2 σ 𝑥 2
=
𝑛(𝑛 − 2) σ(𝑥 − 𝑥)ҧ 2
Econometrics for Finance Teaching Material
…Variance of ei, 𝜶 ෡
ෝ and 𝜷
Example: consider the previous example to compute variance and SE
of the parameters

• Earlier we have computed and


estimated coefficients to be
yˆ t = −1.74 + 1.64 x t

Required: Compute the variance and standard errors of the coefficients?


NB. We use the following formulas
σ 𝑒𝑡 2 1 σ 𝑒𝑡 2 σ 𝑥2
𝑉𝑎𝑟 𝛽መ = ∗ 𝑉𝑎𝑟 𝛼ො = ∗
𝑛 − 2 σ(𝑥 − 𝑥)ҧ 2 𝑛 − 2 𝑛 σ(𝑥 − 𝑥)ҧ 2
Econometrics for Finance Teaching Material
Solution
Obs. x y 𝑦ො e e2 x2 𝑥 − 𝑥ҧ (𝑥 − 𝑥)ҧ 2
1 13.7 17.8 20.74 -2.94 8.63 187.69 -0.88 0.77
2 23.2 39.0 36.32 2.68 7.19 538.24 8.62 74.30
3 6.9 12.8 9.59 3.21 10.33 47.61 -7.68 58.98
4 16.8 24.2 25.82 -1.62 2.63 282.24 2.22 4.93
5 12.3 17.2 18.44 -1.24 1.54 151.29 -2.28 5.20

Sum 72.9 111 110.91 0 30.33 1207.07 (0.00) 144.19


σ 𝑒𝑡 2 1 σ 𝑒𝑡 2 σ 𝑥2
𝑉𝑎𝑟 𝛽መ = ∗ 𝑉𝑎𝑟 𝛼ො = ∗
𝑛 − 2 σ 𝑥 − 𝑥ҧ 2 𝑛 − 2 𝑛 σ(𝑥 − 𝑥)ҧ 2
30.33 1 30.33 1207.07
= ∗ = 𝟎. 𝟎𝟕 = ∗ = 𝟏𝟔. 𝟗𝟑
5 − 2 144.19 5 − 2 5 ∗ 144.19
෡ =
𝑺𝑬 𝜷 ෡ = 𝟎. 𝟎𝟕 = 𝟎. 𝟐𝟔
𝑽𝒂𝒓 𝜷 ෝ =
𝑺𝑬 𝜶 ෝ = 𝟏𝟔. 𝟗𝟑 = 𝟒. 𝟏𝟏
𝑽𝒂𝒓 𝜶
Econometrics for Finance Teaching Material
Goodness Of Fit
• After the estimation of the parameters and the determination of the
least square regression line, A question now is, How well does the
estimated regression equation fit the data?
• That is to say we need to measure the dispersion of observations
around the regression line.
• The closer the observation to the line, the better the goodness of fit,
i.e. the better is the explanation of the variations of Y by the changes
in the explanatory variables.

Econometrics for Finance Teaching Material


Goodness…
• The two most commonly used statistical criteria (first order
tests) in econometric analysis are:
1. The coefficient of determination (the square of the correlation
coefficient i.e. R2). This test is used for judging the explanatory
power of the independent variable(s).
2. The standard error tests of the estimators. This test is used for
judging the statistical reliability of the estimates of the regression
coefficients.

Econometrics for Finance Teaching Material


Goodness…
1. TESTS OF THE ‘GOODNESS OF FIT’ WITH R2
• R2 shows the percentage of total variation of the dependent variable that can
be explained by the changes in the explanatory variable(s) included in the
model.
• Consider the following diagram to understand more about R2
.Y e = Y − Yˆ
Unexplained variation Where;
Y
Y − YTotal variation ei = Yi − Yˆ = deviation of the observation
Yˆ = ˆ 0 + ˆ1 X Yi from the regression line.
Yˆ −Y Y −Y = deviation of Y from its
Explained variation mean.
Yˆ − Y = deviation of the regressed
(predicted) value ( Yˆ ) from the mean.

X
Econometrics for Finance Teaching Material
Goodness… WITH R2
• As can be seen from fig. above, Y − Y measures the variation of the
sample observation value of the dependent from its mean, which is
attributable to:
1. the variation in Y as a result of the influence of X, (i.e. the
regression line) is given by the vertical distance Yˆ − Y
2. The residual variation e = Y − Yˆ
− Y )= (Yˆ − Y ) + (Y − Yˆ )
i.e. (Y
• Since the sum of residuals is zero, squaring and summing the total
variation provides the following:
=  (Yˆ −Y ) +  (Y −Yˆ )
2 2
 (Y −Y )
2
Where.
• TSS: Total sum of squares
TSS = ESS + RSS • ESS: Explained sum of squares
• RSS: Residual sum of squares
Econometrics for Finance Teaching Material
Goodness… WITH R2
• TSS = ESS + RSS

=  (Yˆ −Y ) +  e
2
 (Y −Y )
2 2

• Dividing both sides by TSS provides, R2


ESS
R2 =
TSS
• But since TSS = ESS + RSS, we can also write
ESS TSS − RSS RSS
R2 = = = 1−
TSS TSS TSS
• R2 must always lie between zero and one. To understand this, consider
two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0
ESS = TSS i.e. RSS = 0 so R 2 = ESS/TSS = 1
Econometrics for Finance Teaching Material
Goodness… WITH R2

The Limit Cases: R2 = 0 and R2 = 1


yt
yt

xt xt

Financial Econometrics 45
Goodness… WITH R2

•• Interpretation of R2
• Suppose R 2 = 0.9 , this means that the regression line gives a good fit to
the observed data since this line explains 90% of the total variation of
the Y value around their mean. The remaining 10% of the total
variation in Y is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable

Econometrics for Finance Teaching Material


Goodness… WITH R2
Example: To illustrate, suppose data were collected from a sample of 10
restaurants located near education instituitons in Injibara. For the ith
observation or restaurant in the sample, xi is the size of the student population
(in thou sands) and yi is the quarterly sales (in thousands of dollars).
The values of xi and yi for the 10 restaurants in the sample are summarized in
table below
Required
1. Estimate the coefficients of the
regression line
based on OLS
2. Predict quarterly sales for a
restaurant to be located near a
campus with 16,000 students
3. Compute R2
Econometrics for Finance Teaching Material
Solution: Calculations for the least squares
estimated regression equation

Econometrics for Finance Teaching Material


Solution: Graph of the estimated regression equation

2.We would predict


quarterly sales of
$140,000 for a restaurant
with a near by16,000
student population.

Econometrics for Finance Teaching Material


3. Solution: Computations of R2
 (Y −Yˆ ) and total sum of squares  (Y −Y )
2 2
Computation of the residual sum of squares

RSS= 1,530 TSS= 15,730

• Computation of error= (Y − Yˆ ) =58-70= -12


 (Y −Y )
2
TSS =
• Computation of sum of squared deviations
ˆ −Y )
2
NB =  (Y
=(58−130)𝟐 = 5,184 ESS

 (Y −Yˆ )
2
𝟓𝟖+𝟏𝟎𝟓+⋯…..+𝟐𝟎𝟐 RSS =
Y =
𝟏𝟎
= 𝟏𝟑𝟎
Econometrics for Finance Teaching Material
3. Solution: Computations of R2…
• From the above tables TSS & RSS are computed to be
TSS= 15,730
RSS=1,530
ESS= 15,730-1,530= 14,200
𝐸𝑆𝑆 14,200
R2 = = = 𝟎. 𝟗𝟎𝟐𝟕
𝑇𝑆𝑆 15,730

Econometrics for Finance Teaching Material


DO it Exercise
Example
• Determine the coefficients of the estimated line that best fits the
following data and compute the R2

Econometrics for Finance Teaching Material


Hypothesis Testing: Some Concepts
• Hypothesis is a tentative statement about a population developed for
testing.
• In statistical analysis, we make a claim, that is, state a hypothesis, collect
data, and then use the data to test the assertion.
• For instance
1. In situation of legal system, a person is innocent until proven guilty. A
jury hypothesizes that a person charged with a crime is innocent and
subjects this hypothesis to verification by reviewing the evidence and
hearing testimony before reaching a verdict.
2. In a similar sense, a patient goes to a physician and reports various
symptoms. On the basis of the symptoms, the physician will order certain
diagnostic tests, then, according to the symptoms and the test results,
determine the treatment to be followed.
Econometrics for Finance Teaching Material
What is Hypothesis Testing?
• Hypothesis testing is a procedure based on sample evidence and
probability theory to determine whether the hypothesis is a reasonable
statement.
• Hypothesis testing starts with a statement, or assumption(hypothesis),
about a population parameter-such as the population mean.
For instance;
• A hypothesis might be that the mean monthly salary of fresh
accounting graduates is Br12,000.
• To ascertain this we need to contact all accounting fresh graduates,
which is not easy. To test the validity of the assumption (µ =
Br12,000), we must select a sample from the population of all fresh
accounting graduates, calculate sample statistics, and based on certain
decision rules accept or reject the hypothesis.
Econometrics for Finance Teaching Material
….hypothesis testing

• What would you decide if the sample statistic results in a mean of


A. Br9,000? B. Br 11,990
• Can you come up with the same conclusion in A & B? No!
• Can we attribute the difference of Br10 between the two means to
sampling error, or is that difference statistically significant?

Econometrics for Finance Teaching Material


Hypothesis Testing: Some Concepts
• Once we determine the coefficient estimates of a regression model we
can make test of hypothesis or infer about the population
characteristics.
• We will always have two hypotheses that go together, the null
hypothesis (denoted H0) and the alternative hypothesis (denoted H1).
• The null hypothesis is the statement or the statistical hypothesis that is
actually being tested. The alternative hypothesis represents the
remaining outcomes of interest.

Econometrics for Finance Teaching Material


Hypothesis Testing
• Null hypothesis H0 is a tentative assumption about a population
parameter such as a population mean or a population proportion.
• The alternative hypothesis Ha is a statement that is the opposite of
what is stated in the null hypothesis.
• For example, suppose given the regression results above, we are
interested in the hypothesis that the true value of  is in fact 0.5. We
would use the notation
H0 :  = 0.5
=>This would be known as a two sided test.
H1 :   0.5
• Sometimes we may have some prior information that, for example, we would
expect  > 0.5 rather than  < 0.5. In this case, we would do a one-sided test: H0 : 
= 0.5 and H1 :  > 0.5 or we could have had H0 :  = 0.5 and H1 :  < 0.5
Econometrics for Finance Teaching Material
Hypothesis…
The null hypothesis, is presumed to be true, until the data
provides sufficient evidence that it is not.
If we fail to reject the null hypothesis, it does not mean the
null hypothesis is true.
A hypothesis test does not determine which hypothesis
is true, or which is most likely: it only assesses whether
available evidence exists to reject the null hypothesis.
We usually talk about hypothesis testing in terms of the
null, i.e. we either reject or fail to reject the null -
we never accept the null. As such, if we reject the null, then
we “accept” (i.e. we are left with) the alternative.
Econometrics for Finance Teaching Material
TESTING THE SIGNIFICANCE OF OLS PARAMETERS

• To test the significance of the OLS parameter estimators we need:


✓Variance of the parameter estimators
✓Unbiased estimator of the error, σ2
✓The assumption of normality of the distribution of error term.
• We have already derived that:
𝜎ො 2 𝜎ො 2 σ(𝑥 − 𝑥)ҧ 2  e 2
RSS
𝑉𝑎𝑟 𝛽መ = 𝑉𝑎𝑟 𝛼ො = ˆ 2 = =
σ(𝑥 − 𝑥)ҧ 2 𝑛 σ(𝑥 − 𝑥)ҧ 2 n−2 n−2
• The most common tests of parameters are:
i) Standard error test
ii) T-test (critical value approach, Confidence interval approach & P-value)

Econometrics for Finance Teaching Material


The standard error tests of the estimators
• This test is used for judging the statistical reliability of the estimates of
the regression coefficients.
• We mentioned one of the quality of best estimator is Minimum
variance of 𝜶 ෝ and 𝜷.෡
• The OLS estimator with minimum variance among the class of linear
unbiased estimators is a Best Estimator.
• The OLS estimator with minimum variance will have a minimum
standard error.

Econometrics for Finance Teaching Material


The standard error test…
• Formally we test the null hypothesis: H0: β=0 against the alternative hypothesis
H1: β≠0
• This test involves
1. Compute standard error of the parameters SE( ˆ ) = var( ˆ )
2. Compare the standard errors with the numerical values of 𝛼ො 𝑎𝑛𝑑 𝛽. መ
Decision rule:
1
✓ If SE(𝛽< 𝛽) መ , reject the null hypothesis and accept the alternative
2
hypothesis. We conclude that is statistically significant.
1
✓ If SE(𝛽> 𝛽),መ accept the null hypothesis and reject the alternative
2
hypothesis. We conclude that is statistically insignificant.
Econometrics for Finance Teaching Material
T-Tests
T-Tests consist of: f(x)
Rejection
region for 2
1. Specification of the null hypothesis, H0; tailed tests
and the alternative hypothesis, H1;
2. Specification of the test statistic and its
distribution under the null hypothesis; 2.5% 95% non-rejection
region
2.5%
rejection region rejection region

3. Selection of the significance level α in


order to determine the rejection region;
usually 5% is used in business & social
science. f(x)
Rejection region
for 1 tailed tests
4. Calculation of the test statistic from the
data sample;
5. Conclusions, which are based on the test 95% non-rejection
region 5% rejection region

statistic and the rejection region;


Econometrics for Finance Teaching Material
Specification of the null hypothesis, H0; and the alternative
hypothesis
• Assume a regression equation y =  + x + 
❖The null hypothesis is denoted by H0, and for the univariate
regression can be stated as:
𝐻0 : 𝛽 = 𝑐
✓ where c is a constant value, which we are interested in. When
testing the null hypothesis, we may either reject or fail to
reject the null hypothesis.
❖What could be the alternative hypothesis for the above null
hypothesis?

Econometrics for Finance Teaching Material


• We may specify the alternative hypothesis in thee possible ways:
1. 𝐻1 : 𝛽 > 𝑐 - rejecting 𝐻0 , leads us to “accept” the conclusion that βi>c.
2. 𝐻1 : 𝛽 <c - rejecting 𝐻0 , leads us to “accept” the conclusion that βi<c.
3. 𝐻1 : 𝛽 ≠c - rejecting 𝐻0 , leads us to “accept” the conclusion that βi is
either greater or smaller than c.
• How do wee choose from the options?
❖ Through referring a theory or prior study.
❖ Theories frequently provide information about the signs of the variable
parameters.
❖ For example: economic theory strongly suggests that food expenditure
will rise if income increases, so we would test
H0: β=0 against 𝐻1 : β >0.
Econometrics for Finance Teaching Material
• After the null & alternate hypothesis are specified, the next task is to
Calculate the test statistic. This is given by the formula
$ −  *
test statistic =
SE ( $ )
where  *is the value of  under the null hypothesis.
• Now we need some tabulated distribution with which to compare the
estimated test statistics or to find out critical values/rejection regions at
specified level of significance.
Rejection region:
• Consists of values that have low probability of occurring
• It depends on the size of level of significance (α) and the specification of
the alternative hypothesis.
• If the calculated test statistic value falls in the rejection region, then it is
unlikely that the null hypothesis is holds true.
Econometrics for Finance Teaching Material
• The size of the rejection regions are determined by choosing
a level of significance α - a probability of the unlikely event,
usually 0.01, 0.05, 0.1.
• To determine, whether to reject the null hypothesis or not,
we will compare the calculated t-statistic to the critical
value(table reading value e.g. find a value using α and T-k
degree of freedom for t-test, where k is the number of
parameters)
• If the test statistic lies in the rejection region then reject the
null hypothesis (H0).
Hypothesis…
Develop a rejection rule for the following cases.
Rejection regions for a two-sided 5% hypothesis test Rejection region for a one-sided 5% hypothesis test
f(x)

f(x)
H0 : β =β∗,
H1 : β>β∗

95% non-rejection
region 5% rejection region
2.5% 95% non-rejection 2.5%
rejection region region rejection region

+tCrit.
-tCrit. +tCrit.

Rejection region for a one-sided 5% hypothesis test


H0 : β =β∗,
f(x)

H1 : β<β∗

95% non-rejection region


5% rejection region

-tCrit.
Econometrics for Finance Teaching Material
Hypothesis…

Confidence interval approach


• Provides the same result with the critical value approach
• It involves constructing a confidence interval in which the true
parameter is expected to lie within a certain “degree of confidence”.
i.e. is to establish a limiting values in which an estimate is to lie.
• Level of confidence =1 − 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 (𝛼)
✓ Interval estimate of βi = 𝛽መ𝑖 ± (tcrit x SE(𝛽መ𝑖 ))
❑The confidence interval approach shows the range of values or intervals of
value the βi has to fall not to reject the null.
❑Decision rule: reject Ho if the coefficient of the hypothesis does not lie in
the interval.
Econometrics for Finance Teaching Material
Example: suppose we have the following regression results obtained
from 22 observations: yˆ = 20.3 + 0.5091x Standard deviation of y-intercept and
slope are give respectively as (14.38) (0.2561)
o Required: Test a hypothesis that  =1 at 5% level of significance.
• Remember the procedures mentioned above for hypothesis testing.
• What could be the null and alternate hypothesis?
✓ H0:  =1 and H1:  =1
$− *
• Compute the test stat test stat = $SE (  )
05091
. −1
= = −1917
.
0.2561

• Read t-table to find out rejection region or critical values at t20;5% and
make decision using either the critical value or confidence interval
approach. Econometrics for Finance Teaching Material
Determining the Rejection Region
f(x)
yˆ = 20.3 + 0.5091x
(14.38) (0.2561)
• The hypotheses are: H0 :  = 1
H1 :   1

A. Critical Value Approach 2.5% rejection region


$ −  *
2.5% rejection region
test stat =
SE ( $ )
05091
. −1
= = −1917
. =>Does this value lies in
0.2561
rejection area? -2.086 +2.086
❖ Do not reject H0 since test stat lies within
non-rejection region

B. Confidence Interval Approach What if


✓ Interval estimate of βi = 𝛽መ𝑖 ± (tcrit x SE(𝛽መ𝑖 )) ▪ we wanted to test H0 :  = 0 or H0 :  = 2?
= 0.509 ± 2.086(0.2561) ▪ we wanted to use a 10% size of test?
= (-0.0251, 1.043)
❖ Decision: Do not reject H0 since falls with the interval
estimate.
Some More Terminology

• If we reject the null hypothesis at the 5% level, we say that the


result of the test is statistically significant.
• Note that a statistically significant result may be of no practical
significance.
• E.g. if a shipment of cans of beans is expected to weigh 450g per
tin, but the actual mean weight of some tins is 449g, the result
may be highly statistically significant but presumably nobody
would care about 1g of beans.
The Errors That We Can Make Using Hypothesis Tests

• We usually reject H0 if the test statistic is statistically significant at a chosen


significance level.
• There are two possible errors we could make:
1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.
Reality
H0 is true H0 is false
Significant Type I error 
Result of (reject H0) =
Test Insignificant Type II error
( do not  =
reject H0)
The Trade-off Between Type I and Type II Errors
• The probability of a type I error is just , the significance level or size of test we chose. To see this,
recall what we said significance at the 5% level meant: it is only 5% likely that a result as or more
extreme as this could have occurred purely by chance.
• Note that there is no chance for a free lunch here! What happens if we reduce the size of the test (e.g.
from a 5% test to a 1% test)? We reduce the chances of making a type I error ... but we also reduce the
probability that we will reject the null hypothesis at all, so we increase the probability of a type II error:
less likely
to falsely reject
Reduce size → more strict → reject null
of test criterion for hypothesis more likely to
rejection less often incorrectly not
reject
• So there is always a trade off between type I and type II errors when choosing a significance level. The
only way we can reduce the chances of both is to increase the sample size.
A Special Type of Hypothesis Test: The t-ratio

• Recall that the formula for a test of significance approach to hypothesis testing using a t-test was

$i −  i*
test statistic =
SE( $i )
• If the test is H0 :  i = 0
H1 :  i  0
i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a
t-ratio test:

Since  i* = 0, $i
test stat =
SE ( $i )

• The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.


The t-ratio: An Example
• Suppose that we have the following parameter estimates, standard errors and t-
ratios for an intercept and slope respectively.

Coefficient 1.10 -4.40


SE 1.35 0.96
t-ratio 0.81 -4.63

Compare this with a tcrit with 15-3 = 12 d.f.


(2½% in each tail for a 5% test) = 2.179 5%
= 3.055 1%
• Do we reject H0: 1 = 0? (No)
H0: 2 = 0? (Yes)
What Does the t-ratio tell us?
• If we reject H0, we say that the result is significant.
• If the coefficient is not “significant” (e.g. the intercept coefficient in the last
regression above), then it means that the variable is not helping to explain
variations in y. Variables that are not significant are usually removed from the
regression model.
• In practice there are good statistical reasons for always having a constant even if it
is not significant. Look
ty
at what happens if no intercept is included:

xt
The Exact Significance Level or p-value

• This is equivalent to choosing an infinite number of critical t-values from


tables.
• It gives us the marginal significance level where we would be indifferent
between rejecting and not rejecting the null hypothesis.
• If the test statistic is large in absolute value, the p-value will be small, and vice
versa. The p-value gives the plausibility of the null hypothesis.
e.g. a test statistic is distributed as a t62 = 1.47.
The p-value = 0.12.
✓ Do we reject at the 5% level?...........................No
✓ Do we reject at the 10% level?.........................No
✓ Do we reject at the 20% level?.........................Yes
The Exact Significance Level or p-value
Consider the following data & regression result
• Number of observations: 247
• Dependent variable: X
• Estimates of coefficients is given as
Variable Coefficient Std. Error t-Statistic Prob.
C -2.837834 1.488973 -1.905901 0.0578
X 1.001607 0.000999 1002.331 0.0000

• Thus regression line can be estimated as Y= -2.8378+1.0016X


Question:
• At what level of significance will the hypothesis H0: 1 = 0 be
rejected? Why?
Chapter End!
Consistency
• Additional quality of BLUE estimator
• An estimator is said to be consistent if its value approaches
the actual, true parameter (population) value as the sample
size increases.
• An estimator is consistent if it satisfies two conditions:
a) It is asymptotically unbiased: meaning the biasedness of
an estimator will be avoided as sample gets larger.
b) Its variance converges to 0 as the sample size increases.

Econometrics for Finance Teaching Material


Other way of computing R2
• We may write the observed Y as the sum of the predicted value (Yˆ )
and the residual term (ei.). Y = Yˆ + e
i  i
Observed Yi predicted Yi Re sidual

• .y = yˆ + e by squaring and summing both sides, we obtain the


following expression:
y 2 = ( yˆ 2 + e) 2
y 2 = ( yˆ 2 + ei2 + 2 yei)
= yi + ei2 + 2yˆei
2
Where   yˆe = 0
Therefore; yi2 = 

ˆ2 +
y ei2
 
Total Explained Un exp lained
var iation var iation var ation
Econometrics for Finance Teaching Material
Other way of computing R2
y 2
y
ˆ2
e 2
i
= + i

y 2
i y i
2
y i
2

y
ˆ 2
e 2
1= + i
, where−  y i = TSS
2

y i
2
y i2

ESS RSS
1= +
TSS TSS
ESS yˆ 2
Thus, R = =
2

TSS y i2

Econometrics for Finance Teaching Material


A Note on the t and the Normal Distribution

▪ Normal distribution is characterized with “bell”


normal distribution
shape.
t-distribution

▪ We can scale a normal variate to have zero


mean and unit variance by subtracting its mean
and dividing by its standard deviation.
▪ There is, however, a specific relationship ✓ As the number observations
between the t- and the standard normal gets larger, a t-distribution
distribution. becomes a standard normal
▪ Both are symmetrical and centred on zero. distribution.
✓ The reason for using the t-
▪ The t-distribution has another parameter, its distribution rather than the
degrees of freedom. We will always know this standard normal is that we
(for the time being from the number of had to estimate δ , the SD of
observations -2). the disturbances.
The Assumptions Underlying the Classical Linear Regression Model (CLRM)
• The model which we have used is known as the classical linear regression model.
• We observe data for xt, but since yt also depends on ut, we must be specific about
how the ut are generated.
• We usually make the following set of assumptions about the ut’s (the unobservable
error terms):
• Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt
3. Cov (ui,uj)=0 The errors are statistically independent of
one another
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variate
The Assumptions Underlying the CLRM Again

• An alternative assumption to 4., which is slightly stronger, is that the


xt’s are non-stochastic or fixed in repeated samples.

• A fifth assumption is required if we want to make inferences about


the population parameters (the actual  and ) from the sample
parameters ( $ and $ )
• Additional Assumption
5. ut is normally distributed

You might also like