Test and Goodness of Fit: Investment and Financial Data Analysis 2017

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

F Test and Goodness of Fit

Investment and Financial Data Analysis 2017

Christopher Ting
http://mysmu.edu.sg/faculty/christophert/
k christopherting@smu.edu.sg

Lee Kong Chian School of Business


Singapore Management University

February 10, 2017

Christopher Ting QF302 Week 6 February 10, 2017 1 / 21


Table of Contents

1 Tutorial on Multiple Linear Regression

2 F Tests

3 Goodness of Fit

Christopher Ting QF302 Week 6 February 10, 2017 2 / 21


Tutorial on Multiple Linear Regression

Multiple Linear Regression and the Constant Term

[ Model 2: y = Xβ + u
For t = 1, 2, . . . , T ,

yt = β1 + β2 X2,t + β3 X3,t + ... + βK XK,t + ut ,

There are K parameters, β1 , β2 , . . . , βK .

[ The first parameter is the y-intercept in Model 1, and the average


of y in Model 0, with X1,t = 1 being a constant for all t.
 
1
1
X 1 =  ·· 
 
 
 
·
1

Christopher Ting QF302 Week 6 February 10, 2017 3 / 21


Tutorial on Multiple Linear Regression

Model 1: Simple Linear Regression

[ If K = 2, we are back to Model 1


     
y1 1 x1,1 u1
 y2  1 x2,2  
 u2
 
 β1

 ..  =  .. + .
   
..  β
 ..

 .  . .  2 
yT 1 x2,T uT
T ×1 T ×2 2×1 T ×1

[ Notice that the matrices written in this way are conformable.

Christopher Ting QF302 Week 6 February 10, 2017 4 / 21


Tutorial on Multiple Linear Regression

Numerical Illustration: Data

[ When K = 3,

yt = β1 + β2 x2,t + β3 x3,t + ut

for t = 1, 2, . . . , 15.
   
2.0 3.5 −1.0 −3.0
X 0X )−1
(X =  3.5 1.0 6.5  , X 0y =  2.2 
   
−1.0 6.5 4.3 0.6

[ u0û
The residual sum of sqauares (RSS) is û u = 10.96

Christopher Ting QF302 Week 6 February 10, 2017 5 / 21


Tutorial on Multiple Linear Regression

Calculations in Simple Regression


[ X 0X )−1 by
To calculate the coefficients, just multiply the matrix (X
the vector X 0y to obtain (X
X 0X )−1y
    
2.0 3.5 −1.0 −3.0 1.10
 3.5 1.0 6.5   2.2  =  4.40 
    
−1.0 6.5 4.3 0.6 19.88

[ To calculate the standard errors, we need an estimate of σu2 .


RSS 10.96
s2 = = = 0.91
T −K 15 − 3
[ β is given by
The variance-covariance matrix of β̂
 
1.82 3.19 −0.91
X 0X )−1 = 0.91(X
s2 (X X 0X )−1 =  3.19 0.91 5.92 
−0.91 5.92 3.91

Christopher Ting QF302 Week 6 February 10, 2017 6 / 21


Tutorial on Multiple Linear Regression

Standard Errors and Estimated Model

[ The variances are on the leading diagonal:

var(β̂ 1 ) = 1.82 SE(β̂ 1 ) = 1.35


var(β̂ 2 ) = 0.91 ⇐⇒ SE(β̂ 2 ) = 0.95
var(β̂ 3 ) = 3.91 SE(β̂ 3 ) = 1.98

[ We write:
ŷ = 1.10 − 4.40x2 + 19.88x3
(1.35) (0.96) (1.98)

Christopher Ting QF302 Week 6 February 10, 2017 7 / 21


F Tests

Testing Multiple Hypotheses: The F -test

\ We have used the t-test to test single hypotheses, i.e. hypotheses


involving only one coefficient. But what if we want to test more
than one coefficient simultaneously?

\ Answer: F -test, which involves estimating two regressions:

\ The unrestricted regression is the one in which the coefficients are


freely determined by the data, as we have done before.

\ The restricted regression is the one in which the coefficients are


restricted, i.e. the restrictions are imposed on some parameters of
β.

Christopher Ting QF302 Week 6 February 10, 2017 8 / 21


F Tests

A F -test Example

\ The general regression is

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut

\ We want to test the restriction that β3 + β4 = 1 as we have some


theory suggesting this relationship.
\ The unrestricted regression is (9) above, but what is the restricted
regression?

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut s.t. β3 + β4 = 1

\ We substitute the restriction (β3 + β4 = 1) into the regression so


that it is automatically imposed on the data.
β3 + β4 = 1 ⇒ β4 = 1 − β3

Christopher Ting QF302 Week 6 February 10, 2017 9 / 21


F Tests

The F -test: Forming the Restricted Regression

\ Hence, the restricted regression is

yt = β1 + β2 x2,t + β3 x3,t + (1 − β3 )x4,t + ut


=⇒ yt = β1 + β2 x2,t + β3 x3,t + x4,t − β3 x4,t + ut
=⇒ (yt − x4,t ) = β1 + β2 x2,t + β3 (x3,t − x4,t ) + ut

\ For estimation, we create two new variables, Pt and Qt .

Pt = yt − x4t
Qt = x3t − x4t

\ So
Pt = β1 + β2 x2,t + β3 Qt + ut
is the restricted regression we actually estimate.

Christopher Ting QF302 Week 6 February 10, 2017 10 / 21


F Tests

Calculating the F -Test Statistic


\ The test statistic is given by

RRSS − URSS T − K
test statistic = ×
URSS m

where URSS = RSS from unrestricted regression


RRSS = RSS from restricted regression
m = number of restrictions
T = number of observations
K = number of regressors in unrestricted regression
including a constant in the unrestricted regression
(or the total number of parameters to be estimated).

Christopher Ting QF302 Week 6 February 10, 2017 11 / 21


F Tests

The F -Distribution

\ The test statistic follows the F -distribution, which has a pair of


degree-of-freedom parameters.

\ The value of the degrees of freedom parameters are m and


T − K, respectively.

\ Note that the order of the d.f. parameters is important.

\ The F -distribution has only positive values and is not symmetrical.

\ We therefore only reject the null if the test statistic > critical
F -value.

Christopher Ting QF302 Week 6 February 10, 2017 12 / 21


F Tests

Determining the Number of Restrictions


\ Examples :
H0 : hypothesis Number of restrictions, m
β1 + β2 = 2 1
β2 = 1 and β3 = −1 2
β2 = 0, β3 = 0 and β4 = 0 3
\ If the model is
yt = β1 + β2 x2,t + β3 x3,t + β4,t x4,t + ut ,
then the null hypothesis is
H0 : β2 = 0, and β3 = 0 and β4 = 0,
i.e., all of the coefficients except the intercept coefficient are zero.
\ Alternative hypothesis
H1 : β2 6= 0, or β3 6= 0 or β4 6= 0
Christopher Ting QF302 Week 6 February 10, 2017 13 / 21
F Tests

The Relationship between the t and the


F -Distributions

\ Any hypothesis which could be tested with a t-test could have


been tested using an F -test, but not the other way around.
\ For example, consider the hypothesis
H0 : β2 = 0.5
H1 : β2 6= 0.5
β̂2 − 0.5
We could have tested this using the usual t-test: test stat=
SE(β̂2 )
or it could be tested in the framework above for the F -test.
\ Note that the two tests always give the same result since the
t-distribution is just a special case of the F -distribution.
\ For example, if we have some random variable Z, and Z ∼ tT −K
then also Z 2 ∼ F (1, T − K).
Christopher Ting QF302 Week 6 February 10, 2017 14 / 21
F Tests

F -test Example: Question

\ Suppose a researcher wants to test whether the returns y on a


company stock show unit sensitivity to two factors (factor x2 and
factor x3 ) among three considered.

\ The regression is carried out on 144 monthly observations.

\ The regression is

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut .

– What are the restricted and unrestricted regressions?

– Given that the two RSS are 436.1 and 397.2 respectively, perform
the test.

Christopher Ting QF302 Week 6 February 10, 2017 15 / 21


F Tests

F -test Example: Solution

\ Unit sensitivity implies H0 : β2 = 1 and β3 = 1.


\ The unrestricted regression is the one in the question.
\ The restricted regression is (yt − x2,t − x3,t ) = β1 + β4 x4,t + ut or
letting zt = yt − x2,t − x3,t , the restricted regression is
zt = β1 + β4 x4,t + ut

\ In the F -test formula, T =144, K=4, m=2, RRSS=436.1,


URSS=397.2
\ F -test statistic = 6.86. Critical value is an F (2,140) = 3.07 (5%)
and 4.79 (1%).

\ Conclusion: Reject H0 .

Christopher Ting QF302 Week 6 February 10, 2017 16 / 21


Goodness of Fit

Goodness of Fit Statistics

] The most common goodness of fit statistic is known as R2 . One


way to define R2 is to say that it is the square of the correlation
coefficient between yt and ŷt .

] Recall that what we are interested in doing is explaining the


variability of y about its mean value, i.e., the total sum of squares,
TSS: X
TSS = (yt − ȳ)2
t

] We can split the TSS into two parts, the part which we have
explained (known as the explained sum of squares (ESS), and the
part which we did not explain using the model (RSS).

Christopher Ting QF302 Week 6 February 10, 2017 17 / 21


Goodness of Fit

Defining R2
] That is,
TSS = ESS + RSS
X X X
(yt − ȳ)2 = (ŷt − ȳ)2 + û2t
t t t

] Our goodness of fit statistic is R2 = ESS


TSS
] But since TSS = ESS + RSS, we can also write
TSS − RSS RSS
R2 = =1−
TSS TSS
] R2 must always lie between zero and one. To understand this,
consider two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS = 0
TSS
ESS = TSS i.e. RSS = 0 so R2 = ESS = 1
TSS
Christopher Ting QF302 Week 6 February 10, 2017 18 / 21
Goodness of Fit

Problems with R2 as a Goodness of Fit Measure

1 R2 is defined in terms of variation about the mean of y so that if a


model is reparameterised (rearranged) and the dependent
variable changes, R2 will change.

2 R2 never falls if more regressors are added. to the regression, e.g.


consider:

Regression1 : yt = β1 + β2 x2,t + β3 x3,t + ut


Regression2 : yt = β1 + β2 x2,t + β3 x3,t + β4 x4t, + ut

R2 will always be at least as high for regression 2 relative to


regression 1.

Christopher Ting QF302 Week 6 February 10, 2017 19 / 21


Goodness of Fit

Adjusted R2

] In order to get around these problems, a modification is made to


take into account the loss of degrees of freedom associated with
adding extra variables. This is known as R̄2 , or adjusted R2:
 
2 T −1 2
R̄ = 1 − (1 − R )
T −K

] So if we add an extra regressor, K increases and unless R2


increases by a more than offsetting amount, R̄2 will actually fall.

] But there are still problems with the criterion: No distribution for R̄2
or R2

Christopher Ting QF302 Week 6 February 10, 2017 20 / 21


Goodness of Fit

Akaike Information Criterion (1973)

] For i.i.d. normally distributed errors,


 
RSS
AIC = T ln + 2K.
T

] For small sample sizes (T /K ≤ 40), use the second-order AIC:

2K(K + 1)
AICc = AIC +
T −K −1

] The smaller AIC is, the better is the model in not over-fitting the
data.

Christopher Ting QF302 Week 6 February 10, 2017 21 / 21

You might also like