Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

R300 Advanced Econometric Methods:

Problem Set 4

TA: Cheuk (Next online class 2pm-4pm)

November 10, 2022

1
Question 1

Suppose that
yi = xi β1 + zi β2 + εi .

State and show the Frisch-Waugh-Lovell theorem (say, for β1 ) by


explicit calculation.

2
Frisch-Waugh-Lovell Theorem

Recall the projection matrix

P = X (X ′ X )−1 X ′

and the annihilator matrix

M = I − P = I − X (X ′ X )−1 X ′

These matrices are symmetric and idempotent (A2 = A).

3
Frisch-Waugh-Lovell Theorem

FWL states

y = X1 βˆ1 + X2 βˆ2 + ε̂

and

M1 y = M1 X2 βˆ2 + ε̂, M1 = I − X1 (X1′ X1′ )−1 X1′


|{z} | {z }
û v̂

give the same βˆ2 .

4
Frisch-Waugh-Lovell Theorem

Proof:
y = X1 βˆ1 + X2 βˆ2 + ε̂
M1 y = M1 X1 βˆ1 + M1 X2 βˆ2 + M1 ε̂
| {z } |{z}
=0 =ε̂

5
Frisch-Waugh-Lovell Theorem

Proof:
y = X1 βˆ1 + X2 βˆ2 + ε̂
M1 y = M1 X1 βˆ1 + M1 X2 βˆ2 + M1 ε̂
| {z } |{z}
=0 =ε̂
where
M1 X1 = X1 − X1 (X1′ X1 )−1 X1′ X1 = 0

M1 ε̂ = I − X1 (X1′ X1 )−1 X1′ ε̂


 

= ε̂ − X1 (X1′ X1 )−1 X1′ ε̂


|{z}
=0

= ε̂
5
Question 1 - Solution

The Frisch-Waugh-Lovell theorem here goes as follows. Regressing


yi on zi gives residuals
∑i zi yi
ûi = yi − zi γ̂1 , γ̂1 = .
∑i zi2

Regressing xi on zi gives residuals


∑i zi xi
v̂i = xi − zi γ̂2 , γ̂2 = .
∑i zi2

Regressing ûi on v̂i gives slope coecient


∑ v̂i ûi
βˆ1 = i 2 .
∑i v̂i
This slope is numerically identical to the least-squares estimator
applied directly to the bivariate regression problem.
6
Question 1 - Solution

To see this rst note that the least-squares estimator solves


n
min
b1 ,b2
∑ (yi − xi b1 − zi b2 )2 .
i=1

The rst-order conditions are


n
∑ xi (yi − xi b1 − zi b2 ) = 0,
i=1
n
∑ zi (yi − xi b1 − zi b2 ) = 0.
i=1

From the second equation,


∑ zi (yi − xi b1 ) = ∑ zi2 b2
i i

and so
∑i zi (yi − xi b1 ) 7
b2 = .
∑ z2
Question 1 - Solution

Substituting back into the rst-order condition for b1 gives


n  
∑i ′ zi ′ yi ′ ∑i ′ zi ′ xi ′
∑ xi yi − xi b1 − zi ∑i ′ z 2′ + zi ∑i ′ z 2′ b1 = 0.
i=1 i i

But note that (using the denitions from above) this is


n
∑ xi ((yi − zi γ̂1 ) − (xi − zi γˆ2 ) b1 ) = 0,
i=1
or even n
∑ xi (ûi − v̂i b1 ) = 0.
i=1

The solution is
∑ xi ûi ∑ v̂i ûi
βˆ1 = i = i 2 .
x v̂
∑i i i ∑i v̂i

8
Question 1 - Solution

To see the last transition note that, by denition, xi = zi γ̂2 + v̂i .


Then,

∑ xi ûi = ∑(zi γ̂2 + v̂i ) ûi = γ̂2 ∑ zi ûi + ∑ v̂i ûi = ∑ v̂i ûi .
i i i i i

Indeed, ∑i zi ûi = 0 by properties of the least-squares regression of


yi on zi . The same argument explains why ∑i xi v̂i = ∑i v̂i2 .

Note that the regression of yi on zi is performed mostly for


didactical purposes. Replacing ûi by yi and estimating β1 by
∑i yi v̂i
∑i v̂i2

gives the same result. (Be sure to convince yourself of this.)


9
Question 2

Suppose x is continuous and uniformly distributed on the interval


[θ , θ + 1]. We wish to test

H0 : θ = 0 vs H1 : θ > 0.

Consider the procedure

Reject H0 if x > .95, Accept H0 otherwise.

(i) Compute the size of this test.

(ii) Derive the power function.

10
Hypothesis Testing

Reading: Casella and Berger Chapter 8.


Denition (Hypothesis)
A hypothesis is a statement about a population parameter

Denition (Null and Alternative Hypothesis)


In any hypothesis test, there are two opposing hypotheses: the null
hypothesis, denoted by H0 , and the alternative hypothesis, denoted by H1 .

The general form of above is given by:


H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θc0 ≡ Θ/Θ0
where θ ∈ Θ is a parameter and Θ0 ⊂ Θ is a subset of the parameter space
mentioned in the null hypothesis.
Denition (Hypothesis Test)
A hypothesis test is a rule that species whether H0 is rejected or not for any
realizations of the sample.
11
Hypothesis Testing

Denition (Statistic)
A statistic is a measurable function of the sample. When a statistic is used
for estimation, it is called an estimator. When a statistic is used for
hypothesis testing, it is called a test statistic.

Denition (Rejection Region)


The rejection region is the subset of the sample space such that H0 is
rejected. This rejection region is often dened using the test statistic.

e.g. if we are testing the hypothesis that the population mean is θ , we might
dene the rejection region as

Rn = {x1 , x2 , . . . , xn : x̄n ≤ cn }

where cn is a number to be determined and is called the critical value.

12
Hypothesis Testing

Denition (Power Function)


The power function of a hypothesis test with rejection region R is the
function of θ dened by

β (θ ) = Pθ (Xn ∈ Rn ).

13
Hypothesis Testing

Denition (Power Function)


The power function of a hypothesis test with rejection region R is the
function of θ dened by

β (θ ) = Pθ (Xn ∈ Rn ).

Denition (Type I and Type II Errors)


ˆ Type I: reject H0 when H0 is true
ˆ Type II: accept H0 when H1 is true

13
Hypothesis Testing

Relating power function to Type I and Type 2 Errors


Theorem

probability of a Type I Error if θ ∈ Θ0
β (θ ) =
1 − probability of a Type II Error if θ ∈ Θc0

A good test has power function near 1 for θ ∈ θ0c and near 0 for θ ∈ Θ0 .

Denition (Size)
The size is the maximum probability of error of type I among distributions in
the null hypothesis:

sizen = sup Pθ (Xn ∈ Rn )


θ ∈Θ0

14
Hypothesis Testing

Denition (Size α Test)


A test with power function β (θ ) is a size α test if

sup β (θ ) = α.
θ ∈Θ0

where α ∈ [0, 1].

Denition (Level α Test)


A test with power function β (θ ) is a level α test if

sup β (θ ) ≤ α.
θ ∈Θ0

where α ∈ [0, 1].

We dened these object to xed the probability of committing type 1 error.


Then it makes sense to talk about maximizing power. Otherwise, we can
trivially achieve maximum power by reject everything .
15
Hypothesis Testing

Denition (Unbiased Test)


A test with power function β (θ ) is unbiased if β (θ ′ ) ≥ β (θ ′′ ) for every
θ ′ ∈ Θc0 and θ ′′ ∈ Θ0 .

A unbiased test is more likely to reject H0 if θ ∈ Θc0 than if θ ∈ Θ0 .

16
Hypothesis Testing

Denition (Unbiased Test)


A test with power function β (θ ) is unbiased if β (θ ′ ) ≥ β (θ ′′ ) for every
θ ′ ∈ Θc0 and θ ′′ ∈ Θ0 .

A unbiased test is more likely to reject H0 if θ ∈ Θc0 than if θ ∈ Θ0 .

Denition (Uniformly Most Powerful (UMP))


Let C be the class of all level α tests. A test in class C with power function
β (θ ) is uniformly most powerful if

β (θ ) ≥ β ′ (θ )

for every θ ∈ Θc0 , where β ′ (θ ) is a power function of another test in class C .

Recall β (θ ) is equal to one minus probability of a Type II error when θ ∈ Θc0 .


This means an UMP test minimizes the Type II error while holding the Type I
error constant.

16
Task 2 - Solution

(i) Under H0 , x is uniformly distributed on [0, 1]. Hence,

sup Pθ (X > 0.95) = Pθ0 =0 (X > 0.95)


θ ∈Θ0

= 1 − 0.95 = 0.05

so that this test has size α = .05.

17
Task 2 - Solution

(i) Under H0 , x is uniformly distributed on [0, 1]. Hence,

sup Pθ (X > 0.95) = Pθ0 =0 (X > 0.95)


θ ∈Θ0

= 1 − 0.95 = 0.05

so that this test has size α = .05.

(ii) Given the parameter θ , x is uniformly distributed on [θ , θ + 1] and so

Pθ (x ≤ v ) = v − θ .

17
Task 2 - Solution

(i) Under H0 , x is uniformly distributed on [0, 1]. Hence,

sup Pθ (X > 0.95) = Pθ0 =0 (X > 0.95)


θ ∈Θ0

= 1 − 0.95 = 0.05

so that this test has size α = .05.

(ii) Given the parameter θ , x is uniformly distributed on [θ , θ + 1] and so

Pθ (x ≤ v ) = v − θ .

The power function at θ is

 0 if θ ≤ −.05


β (θ ) := Pθ (x > .95) = 1 − Pθ (x ≤ .95) = θ + .05 if θ ∈ (−.05, .95]) ,
 1

if θ > .95

17
Task 2 - Solution

(i) Under H0 , x is uniformly distributed on [0, 1]. Hence,

sup Pθ (X > 0.95) = Pθ0 =0 (X > 0.95)


θ ∈Θ0

= 1 − 0.95 = 0.05

so that this test has size α = .05.

(ii) Given the parameter θ , x is uniformly distributed on [θ , θ + 1] and so

Pθ (x ≤ v ) = v − θ .

The power function at θ is

 0 if θ ≤ −.05


β (θ ) := Pθ (x > .95) = 1 − Pθ (x ≤ .95) = θ + .05 if θ ∈ (−.05, .95]) ,
 1

if θ > .95
Note that β (0) = 0.05 < β (0 + ε) = 0.05 + ε . This is an unbiased test!

17
Illustration - one-sided test

Figure 1: Source: lecture slides


18
Illustration - two-sided test

Figure 2: Source: lecture slides


19
Question 3

Suppose x ∼ N(µ, σ 2 ). Consider two independent random samples


on x , {x1i }ni=1 and {x2i }ni=1 . Find a sample size n so that
 σ
P |x 1 − x 2 | <
5
is .99. Explain how you proceed.

20
Question 3 - Solution

By normality both sample means x 1 and x 2 have distribution


N(µ, σ 2 /n). By independence, their difference x 1 − x 2 is
distributed as N(0, 2σ 2 /n). Now,
 σ  σ σ  1/5 1/5 
P |x 1 − x 2 | < = P − < x1 − x2 < =P −√ √ <z < √ √
5 5 5 2/ n 2/ n
√ √
for z = (x 1 − x 2 )/( 2σ / n) standard normal.

21
Question 3 - Solution

By normality both sample means x 1 and x 2 have distribution


N(µ, σ 2 /n). By independence, their difference x 1 − x 2 is
distributed as N(0, 2σ 2 /n). Now,
 σ  σ σ  1/5 1/5 
P |x 1 − x 2 | < = P − < x1 − x2 < =P −√ √ <z < √ √
5 5 5 2/ n 2/ n
√ √
for z = (x 1 − x 2 )/( 2σ / n) standard normal.
Thus, we need to solve
1 n 1rn 2
r
= .005 =⇒ 2 5Φ−1 (0.995) = n
 
P z≥ = 1−Φ
5 2 5 2
for n. Solving it gives us that this is equivalent to solving 51 n2 = 2.576 for n
p

(e.g. Use norminv from Matlab or look it up from a table). This yields n ≈ 332.

21
Question 4

Suppose that xi is exponential with density

fθ (x) = θ e −xθ ,

where θ > 0 and x ≥ 0.

(i) Derive the maximum likelihood estimator of θ , say θ̂ .

(ii) Derive the asymptotic distribution of θ̂ .

(iii) Derive the asymptotic distribution of the estimator 1/θ̂ .

22
Delta Method

Theorem (Delta Method)


Let {Xn }n≥1 be a sequence of k−dimensional random variables,
let µ ∈ Rk , let µ ∈ Rk such that

→ N(0k , Σ)
d
n(Xn − µ) −

and let F : Rk → Ru be a continuous dierentiable function at µ .


Then

→ N(0µ , DF (µ)ΣDF (µ)′ )
d
n(F (Xn ) − F (µ)) −

where DF (µ) is the gradient (Jacobian) of F evaluated at µ .

23
Delta Method

Proof - Univariate :
By the mean value theorem (1st order Taylor expansion around θ )

F (Xn ) = F (θ ) + DF (θ̃ )(Xn − θ )

where θ̃ lies between Xn and θ .



Rearrange the object above and multiply by n to get
√ √
n[F (Xn ) − F (θ )] = DF (θ̃ ) n(Xn − θ )
| {z } | {z }
p d
−→DF (θ ) − 2
→N(0,σ )
p p
where DF (θ̃ ) −
→ DF (θ ) because θ̃ −
→ θ and continuous mapping theorem.
p
Note that θ̃ −
→ θ because |θ̃ − θ | < |Xn − θ |.

24
Question 4 - Solution

(i)
log fθ (x) = log θ − xθ
and so
∂ log fθ (x) 1
= −x
∂θ θ
The MLE is 1/x .

(ii) The information bound is θ 2 /n. The asymptotic distribution result is



n(θ̂ − θ ) → N(0, θ 2 ).
d

(iii) We have F (θ ) = 1 and DF (θ ) = − θ12 . Apply the delta method


θ

√ 1 1 d  1 2
n − → N(0,
− θ 2 ) = N(0, 1/θ 2 ).
θ̂ θ θ

25

You might also like