Multiple Hypotesis Testing

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

MULTIPLE HYPOTESIS TESTING

Suppose
Y t =β 0 + β 1 X t ,1 + β 2 X t , 2+ β 3 X t ,3 +ε t
Is the joint effect of X t ,1 and X t ,2 irrelevant once X t ,3 is accounted for?
We do a test:

{ H 0 : β 1=0∧β 2=0
H 1 : β 1 ≠0∨β 2 ≠ 0

We cannot proceed using the t-statistic because it is designed to test the significance of a single regressor.
What do we do?
We define an unrestricted model, that is the initial model with K regressors, and a restricted model, that is
the model without the q regressors that we want to jointly test.
Then, we calculate the SSR for the unrestricted and the restricted model. Of course, SSRR ≥ SSRU (the
unrestricted model has a better performance: smaller is the SSR, better is the fit and higher is the R 2).
In the end, we calculate

F=
( SS R −SS R
q )≥0
R U

SS R U
T −K −1

Where q is the number of restrictions (in the example, 2).


Watch out: K is the number of regressors in the unrestricted model!

Under the null (in which the last q regressors are equal to 0), the variable F has a Fisher distribution with
degrees of freedom d1 = q and d2 = T - K – 1. It is always positive because SSRR ≥ SSRU.

If F is too high, SSRR >>>> SSRU, so we can’t remove the last q regressors because the price we’re paying in a
statistical sense is too high. So, we reject the null hypothesis  the additional q regressors are jointly
significant at a chosen significance level, so I can rely only on the unrestricted model.
If F isn’t too high, so the price we’re paying in a statistical sense isn’t too high, we don’t reject the null
hypothesis  the additional q regressors are jointly not significant at a chosen significance level, so I can
rely on the restricted model.

How can we conclude that β 1=β 2?


We do a test:

{ H 0 : β 1=β 2
H 1 : β1≠ β2
Consider the regression
Y t =β 0 + β 1 X t ,1 + β 2 X t ,2 + β 3 X t ,3 +ε t
That can be rewritten as
Y t =β 0 + β 1 X t ,1−β 2 X t ,1+ β2 X t ,2 + β 2 X t ,1 + β 3 X t ,3 + ε t =¿
¿ β 0 +(β ¿ ¿ 1−β 2 ) X t ,1 + β 2 ¿ ¿ ¿
Consider the auxiliary regression model
' ' '
Y t =β 0 + β 1 X t ,1 + β 2 ¿ ¿
'
Thanks to it, we can do a single hypothesis testing on β 1:

{
' '
H 0 : β1=0
H '1 : β '1 ≠ 0
If we reject the null hypothesis, β 1 ≠ β 2. If we don’t reject it, β 1=β 2.

The likelihood function


The likelihood function is the joint probability density of the observed data ( X 1 , Y 1 ) , … ,(X T ,Y T ), and it’s
a function of the parameters of the assumed DGP.

Assume
1) Y t =β 0 + β 1 X t +ε t
2
2) ε t=Y t −β 0−β 1 X t N (0 ,σ ε )  and the density of a gaussian r.v. is known:

( )
2
1 − (εt )
exp
√2 π σ 2
ε
2 σε
2

3) ε t i.i.d.

Since we have T errors and we assume that they are independent, the joint density function is the product
of the density functions. Therefore, the likelihood function of the unknown parameters β 0 , β 1 , σ ε is

( )
2
T
1 −( Y t −β 0−β 1 X t )
L ( β 0 , β 1 , σ ε ) =∏ exp
t=1 √2 π σ 2
ε
2
2σε

The estimates of the unknown parameters are found maximizing (not minimizing!) this function. But we
will see that this process coincides with the process of minimizing the sum of the squared residuals, like in
the OLS case. When the errors have gaussian distribution, MLE and OLS coincide. But we’ll see later that
the MLE of the errors’ variance doesn’t coincide with the OLS estimator of the errors’ variance.

Assume
1) Y t =β 0 + β 1 X t ,1 +…+ β K X t , K +ε t
2
2) ε t=Y t −β 0 + β 1 X t , 1+ …+ β K X t , K N (0 , σ ε )
3) ε t i.i.d.

The likelihood is

( )
2
T
1 −( Y t −β 0−β 1 X t , 1−…− β K X t , K )
L ( β 0 , … , β K , σ ε ) =∏ exp
t=1 √2 π σ 2
ε
2σε
2

To maximize the likelihood we use the log-likelihood (it’s easier). But maximizing the log-likelihood is like
minimizing minus the log-likelihood. So…
(∏ √ ( ))
2
T
1 −( Y t −β 0−β 1 X t ,1 −…−β K X t ,K )
−log ( L )=−L=−log 2
exp 2
=¿ ¿
t=1 2 π σε 2σε
T
2

T
∑ ( Y t −β 0−β 1 X t , 1+ …−β K X t , K )
¿ log ( 2 π ) +Tlog ( σ ε ) + t=1
2 2 σ 2ε
T
Where log ( 2 π ) is a constant so we’ll call it C.
2

The MLE (Maximum Likelihood Estimators) of the multiple regression model are defined as

( ~β 0 , ~β 1 , … , ~β K , ~σ ε ) =argmin −L ( β 0 , … , β K , σ ε )

This minimization (we derive wrt σ ε ) is solved by


T
~
σ 2ε =∑ ¿ ¿ ¿ ¿
t =1

The MLE estimator of the error’s variance coincides with the sample average of the squared residuals,
while the OLS estimator of the error’s variance doesn’t coincide with the sample variance of the squared
residuals (it was divided by T – K – 1).
This depends by the fact that the OLS estimator is unbiased, while the MLE is biased. However, when
T  + ∞ , also the MLE estimator is asymptotically unbiased.

And the remaining FOC’s are equivalent to


T
( ~β 0 , ~β 1 , … , ~β K ) =argmin ∑ ( Y t− β0 −β 1 X t ,1−…−β K X t , K )2
t =1

Conclusion: the ML estimators of the regressors coincide with the OLS estimators because of the gaussian
assumption for the errors. And the only difference is in the errors’ variance estimator.

Likelihood Ratio Test


Suppose we want to compare 2 regressions. If

the restricted is preferable to the unrestricted.

The maximized value of the log-likelihood function L=log ⁡( L) is


T

∑ (Y t −~β 0−~β 1 X t ,1−…−~β K X t , K )


2

~ ~ T
L ( β 0 ,… , β K )=C− log ( ~
σ ε ) − t=1
2 2~
σ 2ε

¿ C−
T
2
log
T (
SSR U

SSRU
2 SSRU) T
=C− log
2
SSRU T
T
' T
− =C − log ( SS RU )
2 2 ( )
T
−T
Where C’ absorbs also and log (T) (they are constants, too).
2

This implies that


−T
~ ~
L ( β 0 ,… , β K )=eC SS R U2
'

So, the likelihood is inversely proportional to SSR.

Define, knowing that K ≥ Q


~ ~
L ( β ' 0 , … , β ' Q ) SS R U
( )
T
2
λ= ~ ~ = SS R
L ( β 0 ,… , β K ) R

LR=−2 log ( λ )=Tlog


( SS R R
SS R U )
SS R R
Since SS R R ≥ SS RU , LR ≥ 0 (because ≥1 )
SS R U
We compare the null against the alternative:

{ H 0 : LR=0 ( models are equivalent )


H 1 : LR>0(U preferable¿ R)

If we don’t reject the null, the 2 models are equivalent, so we can rely on the more parsimonious one.
If we reject the null means, we can’t rely on the more parsimonious model.

2
Under the null, LR χ m where m = number of restrictions.

You might also like