Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Econ 3334

Module 5
Linear Regression with a
Single Regressor:
Inference
Department of Economics, HKUST
Instructor: Junlong Feng
Fall 2022
Menu of Module 5

I. Hypothesis II. Confidence III. Two-sample


testing interval mean differential

IV. Variance
estimation and
heteroscedasticity

2
I. Hypothesis testing

Linear regression model 𝑌! = 𝛽" + 𝛽#𝑋! + 𝑢! under unconfoundedness, i.i.d, and no


large outliers,
• 𝐸 𝑌 𝑋 = 𝑏 − 𝐸 𝑌 𝑋 = 𝑎 : ATE of 𝑋 changing from 𝑎 to 𝑏 on 𝑌.
• 𝐸 𝑌 𝑋 = 𝛽" + 𝛽#𝑋
∑! %! &%' (! &('
,
• OLS estimator 𝛽# ≡ and 𝛽," ≡ 𝑌. − 𝛽,#𝑋. are
∑! %! &%'
"

• Unbiased for 𝛽! and 𝛽"


• Consistent for 𝛽! and 𝛽"
• Asymptotically normal:
! '() *" +,# -"
𝛽"! is approximately 𝑁 𝛽! , 𝜎$#% , where 𝜎$#% = $
! ! & .#
! '() /" -" ,#
𝛽"" is approximately 𝑁 𝛽" , 𝜎$#% , where 𝜎$#% = , where 𝐻1 =1− 𝑋1
% % & 0 /& & 0 *"&
"

3
I. Hypothesis testing

Recall estimation and inference of population mean: 𝜇! .


"
"
• By CLT: 𝑌* is approximately 𝑁 𝜇! , ! .
#
$ !
# !%&
• Standardize: is approximately 𝑁 0,1 .
"!
• For a null: 𝐻' : 𝜇! = 𝜇!' against 𝐻( : 𝜇! ≠ 𝜇!' with size 𝛼,
𝑛 𝑌* − 𝜇!'
Pr ≤ 𝑧(%) ≈ 1 − 𝛼
𝜎! *
• In practice, 𝜎! unknown. Replaced by a consistent estimator 𝑠! (sample standard
∑ !# %!$ "
deviation: )
#%(
$ ! $ $ ! $
# !%& # !%&
• Reject if > 𝑧(%% . Do not reject if ≤ 𝑧(%%
,! " ,! "

4
I. Hypothesis testing

Now suppose we want to test the null 𝐻' : 𝛽( = 𝛽(' against 𝐻( : 𝛽( ≠ 𝛽(' . We can use exactly
the same idea.
• 𝛽]( is approximately 𝑁 𝛽( , 𝜎-* . .&
-& %.&
.
• Standardize: is approximately 𝑁 0,1 .
"(
' &
• Under the null: 𝐻' : 𝛽( = 𝛽(' with size 𝛼,
𝛽]( − 𝛽('
Pr ≤ 𝑧(%) ≈ 1 − 𝛼
𝜎.- *
&

• In practice, 𝜎.- unknown. Need to replace it by a consistent estimator. Recall 𝜎.-* =


& &
𝑣𝑎𝑟 𝑋/ − 𝜇0 𝑢/ /(𝑛𝜎0 ). Details will be given later. For now, call the estimator 𝑆𝐸(𝛽]( ).
1
-& %.&$
. -& %.&$
.
• Reject if -& ) > 𝑧(%% . Do not reject if -& ) ≤ 𝑧(%% .
23(. " 23(. "

5
I. Hypothesis testing

)# &*#$
* )# &*#$
*
Reject if +,(*)# ) > 𝑧#&% . Do not reject if +,(*)# ) ≤ 𝑧#&% .
" "
)# &*#$
*
• )# ) is a t-statistic.
+,(*
• The rejection rule is a two-sided t-test.
• With critical value 𝑧#&% , the size or the significance level of the test is controlled
"
at 𝛼.
• The asymptotic power is 1, just like the t-test for population mean with sample
average as the estimator.

6
I. Hypothesis testing

Same procedure for 𝐻": 𝛽" = 𝛽"" 𝑣𝑠 𝐻#: 𝛽" ≠ 𝛽"":


)$ &*$$
*
• Form a t-statistic: +,(*)$ ) .
)$ &*$$
* )$ &*$$
*
• T-test: reject if +,(*)$ ) > 𝑧#&% . Do not reject if +,(*)$ ) ≤ 𝑧#&% .
" "

7
I. Hypothesis testing

Example:
i
𝑇𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒 = 698.9 − 2.28 ⋅ 𝑆𝑇𝑅
10.4 (0.52)
• Convention: numbers in parentheses and below the estimated parameter are the
standard errors.
• 𝛽]' = 698.9; 𝑆𝐸 𝛽]' = 10.4.
• 𝛽]( = −2.28; 𝑆𝐸 𝛽]( = 0.52.
• Consider a two-sided test for 𝐻' : 𝛽( = 0.
• Interpretation of the null: since 𝛽( is the marginal average causal effect of STR on test
score, 𝛽( = 0 means 𝑆𝑇𝑅 has no causal effect on test score on average.
%*.*7
• T-statistic: = 4.38 > 2.58 = 𝑧(%$.$& . So reject the null at 1% significance level.
'.8* "

8
I. Hypothesis testing

We can also compute the 𝑝-value.


• For instance for the null 𝛽# = 𝛽#" with the two-sided alternative,
𝛽,# − 𝛽#"
𝑝 = 2Φ −
𝑆𝐸 𝛽,#
• In the example, 𝑝 = 2Φ −4.38 = 1.19×10&/ < 0.01. So, again, significant at 1%
level.

9
I. Hypothesis testing

Example:
h
𝑇𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒 = 698.9 − 2.28 ⋅ 𝑆𝑇𝑅
10.4 (0.52)
• We can also test hypotheses for 𝛽".
• E.g. 𝐻": 𝛽" = 690 𝑣𝑠 𝐻#: 𝛽" ≠ 690
012.1&01"
• T-statistic: #".4
= 0.86
• Smaller than any commonly used critical values.
• Not rejecting at 𝛼 = 0.1,0.05,0.01.
• 𝑝 = 2Φ −0.86 = 0.39.

10
I. Hypothesis testing

Output from R

• The t-value and p-value are all for 𝐻": 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = 0 against a two-sided
alternative.

11
II. Confidence interval

Recall there is another approach for statistical inference:


• Based on the estimator, build a set/interval such that it covers the true
parameter with a large probability (1 − 𝛼).
• (1 − 𝛼) CI for 𝛽#: 𝛽,# − 𝑧#&% ⋅ 𝑆𝐸 𝛽,# , 𝛽,# + 𝑧#&% ⋅ 𝑆𝐸 𝛽,#
" "
2
• 𝑧!+' is again the (1 − )th quantile of 𝑁(0,1)
& %
• For 𝛼 = 0.01, 𝑧!+' = 2.58. For 𝛼 = 0.05, 𝑧!+' = 1.96. For 𝛼 = 0.1, 𝑧!+' = 1.64.
& & &
#! +$!
$
• The coverage probability is still derived from Pr #! ≤ 𝑧!+' ≈ 1 − 𝛼 .
30 $ &

• Similarly, (1 − 𝛼) CI for 𝛽": 𝛽," − 𝑧#&% ⋅ 𝑆𝐸 𝛽," , 𝛽," + 𝑧#&% ⋅ 𝑆𝐸 𝛽,"


" "

12
II. Confidence interval

Example:
h
𝑇𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒 = 698.9 − 2.28 ⋅ 𝑆𝑇𝑅
10.4 (0.52)
• A 95% confidence interval for 𝛽#: −2.28 − 1.96×0.52, −2.28 + 1.96×0.52 =
−3.30, −1.26 .
• 0 is not is the confidence interval, so 𝐻" = 0 can be rejected if the alternative is two-sided.
• Cannot reject any null inside the interval.
• A 95% confidence interval for 𝛽": 698.9 − 1.96×10.4, 698.9 + 1.96×10.4 =
678.52, 719.28 .
• 690 is in the confidence set.

13
II. Confidence interval

From the output from R, you’re able to compute the confidence intervals for any
given coverage probability.

• Example: 90% CI for 𝛽", i.e., the intercept, is [698.93 − 1.64 ⋅ 10.36, 698.93 +
1.64 ⋅ 10.36]

14
III. Two-sample mean differential

We didn’t talk about the two-sample mean testing problem. It’s not because of lack
of importance, but because the conventional way is not convenient.
• Two-sample mean testing problem is crucial for causal inference.
• Suppose you randomize a treatment variable 𝐷 ∈ 0,1 in an i.i.d sample.
• 𝐷! = 1 means individual 𝑖 receives the treatment.
• 𝐷! = 0 means individual 𝑖 does not receive the treatment.
• Examples for 𝐷 include vaccine, vouchers, draft (military service), etc.
• Let the outcome for 𝑖 be 𝑌! . By the randomness of 𝐷! ,
𝐴𝑇𝐸 = 𝐸 𝑌! 𝐷! = 1 − 𝐸 𝑌! 𝐷! = 0
• 𝐻": 𝐴𝑇𝐸 = 0 is equivalent as 𝐻": 𝐸 𝑌! 𝐷! = 1 = 𝐸 𝑌! 𝐷! = 0 .

15
III. Two-sample mean differential

A conventional method: two-sample mean test


• Divide your sample into two subsamples, each associated with one value of 𝐷.
• Calculate the two subsample averages.
• Figure out the asymptotic distribution of the subsample average difference.
• Conduct a t-test.
Inconvenient and need new formulas.

16
III. Two-sample mean differential

We can solve the problem by simply running a regression:


• Write down a linear regression model:
𝑌/ = 𝛽' + 𝛽( 𝐷/ + 𝑢/
• Since 𝐷/ is randomly assigned, the assumption 𝐸 𝑢/ 𝐷/ = 𝐸 𝑢/ is plausible.
• 𝐸 𝑌/ 𝐷/ = 0 = 𝛽' + 𝐸 𝑢/
• 𝐸 𝑌/ 𝐷/ = 1 = 𝛽' + 𝛽( + 𝐸 𝑢/
• Therefore, 𝛽( = 𝐸 𝑌/ 𝐷/ = 1 − 𝐸 𝑌/ 𝐷/ = 0
• Run OLS, get 𝛽]( , and test if it’s zero.
• Further, by constructing the confidence interval, we can get a range for the true ATE
with confidence.
• A unified approach. No separate procedure or formulas are required.

17
III. Two-sample mean differential

Example: In the STR-Test score data, construct 𝐷 = 1 if 𝑆𝑇𝑅 < 20 and 𝐷 = 0 otherwise.

• Interpretation: 𝛽F& : sample average of test scores for group 𝐷 = 0. 𝛽F' : sample mean difference of test
scores in the two groups. 𝛽F& + 𝛽F' : sample average of test scores for group 𝐷 = 1.
• For the null 𝐻& : 𝛽' = 0, t value is 4.04, significant at all commonly adopted levels.
• The means of the two subsamples 𝑆𝑇𝑅 < 20 and 𝑆𝑇𝑅 ≥ 20 are thus significantly different at all
commonly adopted levels.
• 95% confidence interval for 𝛽' : [7.37 − 1.96 ⋅ 1.82, 7.37 + 1.96 ⋅ 1.82].

18
IV. Variance estimation and heteroscedasticity

The rationale behind the testing and confidence interval is


𝛽,# − 𝛽# 𝛽," − 𝛽#
~𝑁 0,1 and ~𝑁 0,1
𝜎*) 𝜎*)
# $

• However in practice we use 𝑆𝐸 𝛽,# and 𝑆𝐸 𝛽," to replace 𝜎*) and 𝜎*)
# $

• This is because 𝜎*) and 𝜎*) are not directly computable:


# $
5 1 𝑣𝑎𝑟 𝑋! − 𝜇% 𝑢! 5 1 𝑣𝑎𝑟 𝐻! 𝑢! 𝜇%
𝜎*) = 4 , 𝜎*) = 5 , where 𝐻! = 1 − 5
𝑋!
# 𝑛 𝜎% $ 𝑛 𝐸 𝐻5 𝐸 𝑋!
!
• 𝑆𝐸 𝛽,# needs to be a consistent estimator of 𝜎*)# . Same for 𝑆𝐸 𝛽,"
• We now only discuss 𝑆𝐸 𝛽,# for simplicity.

19
IV. Variance estimation and heteroscedasticity

! '() *" +,# -"


For 𝜎$# = $ , a natural estimator is
! & .#
1
1 𝑛 ∑1 𝑋1 − 𝑋[ 𝑢\ 1
% %
𝑆𝐸 𝛽"! = × %
𝑛 1
∑1 𝑋1 − 𝑋[ %
𝑛
This estimator makes sense because
!
• In the denominator ∑1 𝑋1 − 𝑋[ % is consistent for 𝜎*% .
&
• For the numerator, note that
% %
𝑣𝑎𝑟 𝑋1 − 𝜇* 𝑢1 = 𝐸 𝑋1 − 𝜇* 𝑢1 − 𝐸 𝑋1 − 𝜇* 𝑢1
The second term is 0 because
𝐸 𝑋1 − 𝜇* 𝑢1 = 𝐸 𝐸 𝑋1 − 𝜇* 𝑢1 |𝑋1 = 𝐸 𝑋1 − 𝜇* 𝐸 𝑢1 𝑋1 = 𝐸 𝑋1 − 𝜇* ⋅ 0 = 0
!
Meanwhile, ∑1 𝑋1 − 𝑋[ % 𝑢\ 1% is consistent for 𝐸 𝑋1 − 𝜇* 𝑢1 % .
&

20
IV. Variance estimation and heteroscedasticity

( 9:; 0# %&* <#


For 𝜎.- = + , a natural estimator is
& # "*
1
1 𝑛 ∑/ 𝑋/ − 𝑋* 𝑢„ /
* *
𝑆𝐸 𝛽]( = × *
𝑛 1
∑/ 𝑋/ − 𝑋* *
𝑛
This estimator is complicated. This is because 𝜎.- is complicated. Under one condition,
&
𝜎.- can be simplified.
&

• Homoscedasticity: 𝐸 𝑢/* 𝑋/ = 𝐸 𝑢/* ≡ 𝜎<* (or 𝑣𝑎𝑟 𝑢/ 𝑋/ = 𝑣𝑎𝑟 𝑢/ ≡ 𝜎<* . Why?)


• Under this condition, recall 𝑣𝑎𝑟 𝑋/ − 𝜇0 𝑢/ = 𝐸 𝑋/ − 𝜇0 𝑢/ * . And
𝐸 𝑋/ − 𝜇0 𝑢/ * = 𝐸 𝑋/ − 𝜇0 * 𝑢/* = 𝐸 𝐸 𝑢/* 𝑋/ 𝑋/ − 𝜇0 * = 𝜎<* 𝜎0*
( ",
• Therefore, 𝜎.- under homoscedasticity is simplified as .
& # "*

21
IV. Variance estimation and heteroscedasticity

6(
Under homoscedasticity, 𝜎*) = . A corresponding consistent estimator is
# 76)
1
1 ∑! 𝑢• !5
𝑛
𝑆𝐸 𝛽,# = ⋅
𝑛 1
∑! 𝑋! − 𝑋. 5
𝑛
• Much simpler than the general case without assuming homoscedasticity.
• Incorrect when homoscedasticity fails.
.
• When homoscedasticity fails, this simpler formula of 𝑆𝐸 𝛽"! is still consistent of ( , but
&.#
.(
is no longer equal to 𝜎# . Then t-test based on this simpler 𝑆𝐸 𝛽"! no longer has the
$!
&.#
desired size control.

22
IV. Variance estimation and heteroscedasticity

When 𝐸 𝑢/* 𝑋/ ≠ 𝐸 𝑢/* with positive probability, we say the error has
heteroscedasticity.
• Homo- and heteroscedasticity are concerning the conditional variance independence of
𝑢/ and 𝑋/ .
• Our three assumptions for OLS to be unbiased, consistent and asymptotically normal
are irrelevant to them.
• The only thing that matters is whether you can use the simpler formula for 𝑆𝐸.
• The simpler formula can only be used under homoscedasticity.
• The more complicated one can be used in both scenarios because we derived it without
assuming either homo- or heteroscedasticity.
• For this reason, the more complicated formula is called heteroscedastic robust standard
error.

23
IV. Variance estimation and heteroscedasticity

In the past, people test homoscedasticity first and if homo cannot be rejected, they
use the simpler formula for SE.

This is NOT necessary because the homo tests are known for many problems (low
power etc.)

Why don’t we just use a universal formula that applies to both cases and stay
agnostic about the conditional variance? It doesn’t affect unbiasedness and
consistency of OLS anyways.

Just using the heteroscedastic robust SE (the more complicated version) without
worrying about heteroscedasticity is today’s standard.
24

You might also like