Test and Goodness of Fit: Investment and Financial Data Analysis 2017

F Test and Goodness of Fit
Investment and Financial Data Analysis 2017
Christopher Ting
http://mysmu.edu.sg/faculty/christophert/
k christopherting@smu.edu.sg
Lee Kong Chian School of Business

Singapore Management University
February 10, 2017
Christopher Ting QF302 Week 6 February 10, 2017 1 / 21

Table of Contents
1 Tutorial on Multiple Linear Regression
2 F Tests
3 Goodness of Fit

Tutorial on Multiple Linear Regression
Multiple Linear Regression and the Constant Term
[ Model 2: y = Xβ + u
For t = 1, 2, . . . , T ,
yt = β1 + β2 X2,t + β3 X3,t + ... + βK XK,t + ut ,
There are K parameters, β1 , β2 , . . . , βK .
[ The first parameter is the y-intercept in Model 1, and the average

of y in Model 0, with X1,t = 1 being a constant for all t.
 
1
1
X 1 =  ·· 
 
 
 
·
1

Model 1: Simple Linear Regression
[ If K = 2, we are back to Model 1

     
y1 1 x1,1 u1
 y2  1 x2,2 
 u2

 β1

 ..  =  .. + .
   
..  β
 ..

 .  . .  2 
yT 1 x2,T uT
T ×1 T ×2 2×1 T ×1
[ Notice that the matrices written in this way are conformable.

Numerical Illustration: Data
[ When K = 3,
yt = β1 + β2 x2,t + β3 x3,t + ut
for t = 1, 2, . . . , 15.
   
2.0 3.5 −1.0 −3.0
X 0X )−1
(X =  3.5 1.0 6.5  , X 0y =  2.2 
   
−1.0 6.5 4.3 0.6
[ u0û
The residual sum of sqauares (RSS) is û u = 10.96

Calculations in Simple Regression

[ X 0X )−1 by
To calculate the coefficients, just multiply the matrix (X
the vector X 0y to obtain (X
X 0X )−1y
    
2.0 3.5 −1.0 −3.0 1.10
 3.5 1.0 6.5   2.2  =  4.40 
    
−1.0 6.5 4.3 0.6 19.88
[ To calculate the standard errors, we need an estimate of σu2 .

RSS 10.96
s2 = = = 0.91
T −K 15 − 3
[ β is given by
The variance-covariance matrix of β̂
 
1.82 3.19 −0.91
X 0X )−1 = 0.91(X
s2 (X X 0X )−1 =  3.19 0.91 5.92 
−0.91 5.92 3.91

Standard Errors and Estimated Model
[ The variances are on the leading diagonal:
var(β̂ 1 ) = 1.82 SE(β̂ 1 ) = 1.35

var(β̂ 2 ) = 0.91 ⇐⇒ SE(β̂ 2 ) = 0.95
var(β̂ 3 ) = 3.91 SE(β̂ 3 ) = 1.98
[ We write:
ŷ = 1.10 − 4.40x2 + 19.88x3
(1.35) (0.96) (1.98)

F Tests
Testing Multiple Hypotheses: The F -test
\ We have used the t-test to test single hypotheses, i.e. hypotheses

involving only one coefficient. But what if we want to test more
than one coefficient simultaneously?
\ Answer: F -test, which involves estimating two regressions:
\ The unrestricted regression is the one in which the coefficients are

freely determined by the data, as we have done before.
\ The restricted regression is the one in which the coefficients are

restricted, i.e. the restrictions are imposed on some parameters of
β.

F Tests
A F -test Example
\ The general regression is
yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut
\ We want to test the restriction that β3 + β4 = 1 as we have some

theory suggesting this relationship.
\ The unrestricted regression is (9) above, but what is the restricted
regression?
yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut s.t. β3 + β4 = 1
\ We substitute the restriction (β3 + β4 = 1) into the regression so

that it is automatically imposed on the data.
β3 + β4 = 1 ⇒ β4 = 1 − β3

F Tests
The F -test: Forming the Restricted Regression
\ Hence, the restricted regression is
yt = β1 + β2 x2,t + β3 x3,t + (1 − β3 )x4,t + ut

=⇒ yt = β1 + β2 x2,t + β3 x3,t + x4,t − β3 x4,t + ut
=⇒ (yt − x4,t ) = β1 + β2 x2,t + β3 (x3,t − x4,t ) + ut
\ For estimation, we create two new variables, Pt and Qt .
Pt = yt − x4t
Qt = x3t − x4t
\ So
Pt = β1 + β2 x2,t + β3 Qt + ut
is the restricted regression we actually estimate.

F Tests
Calculating the F -Test Statistic

\ The test statistic is given by
RRSS − URSS T − K
test statistic = ×
URSS m
where URSS = RSS from unrestricted regression

RRSS = RSS from restricted regression
m = number of restrictions
T = number of observations
K = number of regressors in unrestricted regression
including a constant in the unrestricted regression
(or the total number of parameters to be estimated).

F Tests
The F -Distribution
\ The test statistic follows the F -distribution, which has a pair of

degree-of-freedom parameters.
\ The value of the degrees of freedom parameters are m and

T − K, respectively.
\ Note that the order of the d.f. parameters is important.
\ The F -distribution has only positive values and is not symmetrical.
\ We therefore only reject the null if the test statistic > critical
F -value.

F Tests
Determining the Number of Restrictions

\ Examples :
H0 : hypothesis Number of restrictions, m
β1 + β2 = 2 1
β2 = 1 and β3 = −1 2
β2 = 0, β3 = 0 and β4 = 0 3
\ If the model is
yt = β1 + β2 x2,t + β3 x3,t + β4,t x4,t + ut ,
then the null hypothesis is
H0 : β2 = 0, and β3 = 0 and β4 = 0,
i.e., all of the coefficients except the intercept coefficient are zero.
\ Alternative hypothesis
H1 : β2 6= 0, or β3 6= 0 or β4 6= 0
F Tests
The Relationship between the t and the

F -Distributions
\ Any hypothesis which could be tested with a t-test could have

been tested using an F -test, but not the other way around.
\ For example, consider the hypothesis
H0 : β2 = 0.5
H1 : β2 6= 0.5
β̂2 − 0.5
We could have tested this using the usual t-test: test stat=
SE(β̂2 )
or it could be tested in the framework above for the F -test.
\ Note that the two tests always give the same result since the
t-distribution is just a special case of the F -distribution.
\ For example, if we have some random variable Z, and Z ∼ tT −K
then also Z 2 ∼ F (1, T − K).
F Tests
F -test Example: Question
\ Suppose a researcher wants to test whether the returns y on a

company stock show unit sensitivity to two factors (factor x2 and
factor x3 ) among three considered.
\ The regression is carried out on 144 monthly observations.
\ The regression is
yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut .
– What are the restricted and unrestricted regressions?
– Given that the two RSS are 436.1 and 397.2 respectively, perform
the test.

F Tests
F -test Example: Solution
\ Unit sensitivity implies H0 : β2 = 1 and β3 = 1.

\ The unrestricted regression is the one in the question.
\ The restricted regression is (yt − x2,t − x3,t ) = β1 + β4 x4,t + ut or
letting zt = yt − x2,t − x3,t , the restricted regression is
zt = β1 + β4 x4,t + ut
\ In the F -test formula, T =144, K=4, m=2, RRSS=436.1,

URSS=397.2
\ F -test statistic = 6.86. Critical value is an F (2,140) = 3.07 (5%)
and 4.79 (1%).
\ Conclusion: Reject H0 .

Goodness of Fit
Goodness of Fit Statistics
] The most common goodness of fit statistic is known as R2 . One

way to define R2 is to say that it is the square of the correlation
coefficient between yt and ŷt .
] Recall that what we are interested in doing is explaining the

variability of y about its mean value, i.e., the total sum of squares,
TSS: X
TSS = (yt − ȳ)2
t
] We can split the TSS into two parts, the part which we have
explained (known as the explained sum of squares (ESS), and the
part which we did not explain using the model (RSS).

Goodness of Fit
Defining R2
] That is,
TSS = ESS + RSS
X X X
(yt − ȳ)2 = (ŷt − ȳ)2 + û2t
t t t
] Our goodness of fit statistic is R2 = ESS

TSS
] But since TSS = ESS + RSS, we can also write
TSS − RSS RSS
R2 = =1−
TSS TSS
] R2 must always lie between zero and one. To understand this,
consider two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS = 0
TSS
ESS = TSS i.e. RSS = 0 so R2 = ESS = 1
TSS
Goodness of Fit
Problems with R2 as a Goodness of Fit Measure
1 R2 is defined in terms of variation about the mean of y so that if a

model is reparameterised (rearranged) and the dependent
variable changes, R2 will change.
2 R2 never falls if more regressors are added. to the regression, e.g.

consider:
Regression1 : yt = β1 + β2 x2,t + β3 x3,t + ut

Regression2 : yt = β1 + β2 x2,t + β3 x3,t + β4 x4t, + ut
R2 will always be at least as high for regression 2 relative to

regression 1.

Goodness of Fit
Adjusted R2
] In order to get around these problems, a modification is made to

take into account the loss of degrees of freedom associated with
adding extra variables. This is known as R̄2 , or adjusted R2:

2 T −1 2
R̄ = 1 − (1 − R )
T −K
] So if we add an extra regressor, K increases and unless R2

increases by a more than offsetting amount, R̄2 will actually fall.
] But there are still problems with the criterion: No distribution for R̄2
or R2

Goodness of Fit
Akaike Information Criterion (1973)
] For i.i.d. normally distributed errors,

RSS
AIC = T ln + 2K.
T
] For small sample sizes (T /K ≤ 40), use the second-order AIC:
2K(K + 1)
AICc = AIC +
T −K −1
] The smaller AIC is, the better is the model in not over-fitting the
data.

Test and Goodness of Fit: Investment and Financial Data Analysis 2017

Uploaded by

Copyright:

Available Formats

You might also like

Test and Goodness of Fit: Investment and Financial Data Analysis 2017

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Test and Goodness of Fit: Investment and Financial Data Analysis 2017

Uploaded by

Copyright:

Available Formats

F Test and Goodness of Fit

Investment and Financial Data Analysis 2017

Lee Kong Chian School of Business

February 10, 2017

Christopher Ting QF302 Week 6 February 10, 2017 1 / 21

1 Tutorial on Multiple Linear Regression

Christopher Ting QF302 Week 6 February 10, 2017 2 / 21

Multiple Linear Regression and the Constant Term

yt = β1 + β2 X2,t + β3 X3,t + ... + βK XK,t + ut ,

There are K parameters, β1 , β2 , . . . , βK .

[ The first parameter is the y-intercept in Model 1, and the average

Christopher Ting QF302 Week 6 February 10, 2017 3 / 21

Model 1: Simple Linear Regression

[ If K = 2, we are back to Model 1

[ Notice that the matrices written in this way are conformable.

Christopher Ting QF302 Week 6 February 10, 2017 4 / 21

Numerical Illustration: Data

Christopher Ting QF302 Week 6 February 10, 2017 5 / 21

Calculations in Simple Regression

[ To calculate the standard errors, we need an estimate of σu2 .

Christopher Ting QF302 Week 6 February 10, 2017 6 / 21

Standard Errors and Estimated Model

[ The variances are on the leading diagonal:

var(β̂ 1 ) = 1.82 SE(β̂ 1 ) = 1.35

Christopher Ting QF302 Week 6 February 10, 2017 7 / 21

Testing Multiple Hypotheses: The F -test

\ We have used the t-test to test single hypotheses, i.e. hypotheses

\ Answer: F -test, which involves estimating two regressions:

\ The unrestricted regression is the one in which the coefficients are

\ The restricted regression is the one in which the coefficients are

Christopher Ting QF302 Week 6 February 10, 2017 8 / 21

\ The general regression is

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut

\ We want to test the restriction that β3 + β4 = 1 as we have some

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut s.t. β3 + β4 = 1

\ We substitute the restriction (β3 + β4 = 1) into the regression so

Christopher Ting QF302 Week 6 February 10, 2017 9 / 21

The F -test: Forming the Restricted Regression

\ Hence, the restricted regression is

yt = β1 + β2 x2,t + β3 x3,t + (1 − β3 )x4,t + ut

\ For estimation, we create two new variables, Pt and Qt .

Christopher Ting QF302 Week 6 February 10, 2017 10 / 21

Calculating the F -Test Statistic

where URSS = RSS from unrestricted regression

Christopher Ting QF302 Week 6 February 10, 2017 11 / 21

\ The test statistic follows the F -distribution, which has a pair of

\ The value of the degrees of freedom parameters are m and

\ Note that the order of the d.f. parameters is important.

\ The F -distribution has only positive values and is not symmetrical.

Christopher Ting QF302 Week 6 February 10, 2017 12 / 21

Determining the Number of Restrictions

The Relationship between the t and the

\ Any hypothesis which could be tested with a t-test could have

F -test Example: Question

\ Suppose a researcher wants to test whether the returns y on a

\ The regression is carried out on 144 monthly observations.

yt = β1 + β2 x2,t + β3 x3,t + β4 x4,t + ut .