Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Statistics 2 for Economics

Course code: 6012B0451 2023–2024

Syllabus B
Version 1.0 October 2023

week 5-6-7

Faculty of Economics and Business


University of Amsterdam
Statistics 2 for Economics
Table of contents
Supplements
5 – Deriving the normal equations ............................................................................................ 3
6 – Summary – ANOVA table ..................................................................................................... 4
7 – Formula sheet – Simple regression analysis (𝒌𝒌 = 𝟏𝟏).......................................................... 5
8 – Overview of conditions for simple regression ..................................................................... 6

Week 5 – Probability theory, estimators, consistency


Homework exercises ................................................................................................................. 7
Homework solutions ............................................................................................................... 11
Tutorial exercises..................................................................................................................... 17

Week 6 – Simple regression


Homework exercises ............................................................................................................... 19
Homework solutions ............................................................................................................... 24
Tutorial exercises..................................................................................................................... 29

Week 7 – Simple regression


Homework exercises ............................................................................................................... 32
Homework solutions ............................................................................................................... 35
Tutorial exercises..................................................................................................................... 40

* * *

S&W = Stock, J.H. and Watson, M.W. – Introduction to Econometrics


Pearson Education, fourth edition, ISBN 13: 978-1-292-26445-5

Week 5
S&W Ch.2.3 (p65-74, also 62)
S&W Ch.2.5 (p81-84)
S&W Ch.2.6 (p85-90)
S&W Ch.3.1 (p104-108), App3.2, p112-113 (in 3.2), p129 (in 3.7), (p694)

Week 6
S&W Ch.4, App4.2-4.3

Week 7
S&W Ch.5, App5.1 (and 18.1)

*
* *

2
Supplement 5
Deriving the normal equations
The least squares method was introduced in Chapter 4. The objective was to determine the sample regression
line:
𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋
That would minimize the sum of squared deviations 𝑆𝑆𝑆𝑆𝑆𝑆 between the points and the line. That is, the
method determines values for 𝛽𝛽̂0 and 𝛽𝛽̂1 such that
𝑛𝑛
2
𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 �
𝑖𝑖=1

is minimized. Since 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 we have:


𝑛𝑛 𝑛𝑛
2 2
𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 � = � �𝑌𝑌𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 ��
𝑖𝑖=1 𝑖𝑖=1

We find the values of 𝛽𝛽̂0 and 𝛽𝛽̂1 that minimize 𝑆𝑆𝑆𝑆𝑆𝑆 is accomplished using differential calculus. We begin
by partially differentiating 𝑆𝑆𝑆𝑆𝑆𝑆 with respect to 𝛽𝛽̂0 and 𝛽𝛽̂1 , setting the partial derivatives equal to zero, and
solving the two equations:
𝑛𝑛
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
= −2 ∙ � �𝑌𝑌𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 �� = 0
𝜕𝜕𝛽𝛽̂0
𝑖𝑖=1
𝑛𝑛
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
= −2 ∙ � �𝑌𝑌𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 �� 𝑋𝑋𝑖𝑖 = 0
𝜕𝜕𝛽𝛽̂1
𝑖𝑖=1

These two equations can now be reduced to what are called “the normal equations”:
𝑛𝑛 𝑛𝑛

� 𝑌𝑌𝑖𝑖 − 𝑛𝑛 ∙ 𝛽𝛽̂0 − 𝛽𝛽̂1 � 𝑋𝑋𝑖𝑖 = 0


𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛

� 𝑌𝑌𝑖𝑖 𝑋𝑋𝑖𝑖 − 𝛽𝛽̂0 � 𝑋𝑋𝑖𝑖 − 𝛽𝛽̂1 � 𝑋𝑋𝑖𝑖2 = 0


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

The normal equations are solved simultaneously to yield:


𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋�
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) (𝑛𝑛 − 1) ∙ 𝑠𝑠𝑋𝑋𝑋𝑋 𝑠𝑠𝑋𝑋𝑋𝑋
𝛽𝛽̂1 = = = 2
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 (𝑛𝑛 − 1) ∙ 𝑠𝑠𝑋𝑋2 𝑠𝑠𝑋𝑋
Hence, a minimum of 𝑆𝑆𝑆𝑆𝑆𝑆 (and not a maximum) occurs at (𝛽𝛽̂0 , 𝛽𝛽̂1 ).

3
Supplement 6
Summary – ANOVA table

Simple regression: 𝒌𝒌 = 𝟏𝟏 � 𝟎𝟎 + 𝜷𝜷
� = 𝜷𝜷
𝒀𝒀 � 𝟏𝟏 𝑿𝑿
ANOVA Sum of Squares using 𝑅𝑅 2 using 𝑠𝑠…2 df Mean Square using 𝑠𝑠…2
𝑛𝑛
2
2 𝑠𝑠𝑋𝑋𝑋𝑋 𝐸𝐸𝐸𝐸𝐸𝐸
Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = ��𝑌𝑌�𝑖𝑖 − 𝑌𝑌�� 2
= 𝑅𝑅 ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 = (𝑛𝑛 − 1) 2 1
𝑠𝑠𝑋𝑋 1
𝑖𝑖=1
𝑛𝑛 2
2 𝑠𝑠𝑋𝑋𝑋𝑋 𝑆𝑆𝑆𝑆𝑆𝑆
Residuals 𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 � = (1 − 𝑅𝑅 2 ) ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 = (𝑛𝑛 − 1) �𝑠𝑠𝑌𝑌2 − � 𝑛𝑛 − 2 = 𝑠𝑠𝑢𝑢�2 = 𝑆𝑆𝑆𝑆𝑅𝑅 2
𝑠𝑠𝑋𝑋2 𝑛𝑛 − 2
𝑖𝑖=1
𝑛𝑛
𝑇𝑇𝑇𝑇𝑇𝑇
Total 𝑇𝑇𝑇𝑇𝑇𝑇 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)2 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑆𝑆𝑆𝑆𝑆𝑆 = (𝑛𝑛 − 1)𝑠𝑠𝑌𝑌2 𝑛𝑛 − 1 = 𝑠𝑠𝑌𝑌2
𝑛𝑛 − 1
𝑖𝑖=1

𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 2
𝑠𝑠𝑋𝑋𝑋𝑋 (𝑛𝑛 − 2)𝑠𝑠𝑢𝑢�2
=1− = 𝑅𝑅 2 = =1−
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝑠𝑠𝑋𝑋2 𝑠𝑠𝑌𝑌2 (𝑛𝑛 − 1)𝑠𝑠𝑌𝑌2
Supplement 7
Formula sheet – Simple regression analysis (𝒌𝒌 = 𝟏𝟏)
𝐸𝐸(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐) = 𝑎𝑎 ∙ 𝜇𝜇𝑋𝑋 + 𝑏𝑏 ∙ 𝜇𝜇𝑌𝑌 + 𝑐𝑐 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
𝜎𝜎𝛽𝛽�2 =
𝑣𝑣𝑣𝑣𝑣𝑣(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐) = 𝑎𝑎 ∙ 2
𝜎𝜎𝑋𝑋2 + 𝑏𝑏 ∙ 2
𝜎𝜎𝑌𝑌2 + 2𝑎𝑎𝑎𝑎 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 1 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 )]2
𝑐𝑐𝑐𝑐𝑐𝑐(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐, 𝑊𝑊) = 𝑎𝑎 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝑏𝑏 ∙ 𝜎𝜎𝑌𝑌𝑌𝑌 1 𝑣𝑣𝑣𝑣𝑣𝑣(𝐻𝐻𝑖𝑖 𝑢𝑢𝑖𝑖 )
𝜎𝜎𝛽𝛽�2 =
0 𝑛𝑛 �𝐸𝐸�𝐻𝐻 2 ��2
𝑖𝑖
𝐸𝐸(𝑌𝑌) = 𝐸𝐸�𝐸𝐸(𝑌𝑌|𝑋𝑋)�
𝜇𝜇𝑋𝑋
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 𝑥𝑥) = 𝐸𝐸( [𝑌𝑌 − 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 𝑥𝑥)]2 | 𝑋𝑋 = 𝑥𝑥) with 𝐻𝐻𝑖𝑖 = 1 − � � 𝑋𝑋𝑖𝑖
𝐸𝐸�𝑋𝑋𝑖𝑖2 �

(E) Regression analysis 1


1 𝑛𝑛−2 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 𝑢𝑢�𝑖𝑖2
𝜎𝜎�𝛽𝛽�2 =
(E1) Regression for 𝒌𝒌 = 𝟏𝟏 1 𝑛𝑛 �1 ∑(𝑋𝑋 − 𝑋𝑋�)2 �2
𝑖𝑖
𝑛𝑛
1 �𝑖𝑖2 𝑢𝑢�𝑖𝑖2
1 𝑛𝑛−2 ∑ 𝐻𝐻
𝑌𝑌�𝑖𝑖 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 𝜎𝜎�𝛽𝛽�2 =
0 �𝑖𝑖2 �2
𝑛𝑛 �1 ∑ 𝐻𝐻
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� 𝑛𝑛
𝑠𝑠𝑋𝑋𝑋𝑋 𝑋𝑋�
𝛽𝛽̂1 = 2 �𝑖𝑖 = 1 − �
with 𝐻𝐻 � 𝑋𝑋𝑖𝑖
𝑠𝑠𝑋𝑋 1
𝑛𝑛
∑ 𝑋𝑋𝑖𝑖2
𝑛𝑛
1
𝑠𝑠𝑋𝑋𝑋𝑋 = 𝑛𝑛−1 �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) Homoskedasticity only:
𝑖𝑖=1 𝜎𝜎𝑢𝑢2
𝑛𝑛 𝜎𝜎𝛽𝛽�2 =
1
1 𝑛𝑛𝜎𝜎𝑋𝑋2
𝑠𝑠𝑋𝑋2 = 𝑛𝑛−1 �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2
𝐸𝐸�𝑋𝑋𝑖𝑖2 �𝜎𝜎𝑢𝑢2
𝑖𝑖=1 𝜎𝜎𝛽𝛽�2 =
0 𝑛𝑛𝜎𝜎𝑋𝑋2
𝑢𝑢�𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖
𝑛𝑛 2
2 𝑠𝑠𝑋𝑋𝑋𝑋 𝑠𝑠𝑢𝑢�2
𝐸𝐸𝐸𝐸𝐸𝐸 = ��𝑌𝑌�𝑖𝑖 − 𝑌𝑌�� = (𝑛𝑛 − 1) 𝜎𝜎�𝛽𝛽�2 =
𝑠𝑠𝑋𝑋2 1 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2
𝑖𝑖=1
𝑛𝑛 𝑛𝑛
2
2
𝑠𝑠𝑋𝑋𝑋𝑋 �𝑛𝑛1 ∑ 𝑋𝑋𝑖𝑖2 �𝑠𝑠𝑢𝑢2�
𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 � = (𝑛𝑛 − 1) �𝑠𝑠𝑌𝑌2 − 2 � = � 𝑢𝑢�𝑖𝑖2 𝜎𝜎�𝛽𝛽�2 =
𝑠𝑠𝑋𝑋 0 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
Pearson correlation
𝑇𝑇𝑇𝑇𝑇𝑇 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)2 = (𝑛𝑛 − 1)𝑠𝑠𝑌𝑌2
Coefficient of correlation:
𝑖𝑖=1
𝑠𝑠𝑋𝑋𝑋𝑋 2
𝑇𝑇𝑇𝑇𝑇𝑇 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑆𝑆𝑆𝑆𝑆𝑆 𝑟𝑟𝑋𝑋𝑋𝑋 = 𝑟𝑟𝑋𝑋𝑋𝑋 = 𝑅𝑅 2
𝑠𝑠𝑋𝑋 ∙ 𝑠𝑠𝑌𝑌
𝑆𝑆𝑆𝑆𝑆𝑆 Test statistic:
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑢𝑢� = �
𝑛𝑛 − 2 𝑛𝑛 − 2
𝑡𝑡 = 𝑟𝑟 ∙ � ~ t[df = 𝑛𝑛 − 2]
𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 2
𝑠𝑠𝑋𝑋𝑋𝑋 2
1 − 𝑟𝑟 2
𝑅𝑅 2 = =1− = = 𝑟𝑟𝑋𝑋𝑋𝑋
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝑠𝑠𝑋𝑋2 ∙ 𝑠𝑠𝑌𝑌2 Spearman rank correlation
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = �𝜎𝜎�𝛽𝛽�2 or �𝜎𝜎�𝛽𝛽�2 Coefficient of correlation:
1 1 𝑠𝑠𝑎𝑎𝑎𝑎
𝑟𝑟𝑠𝑠 =
Test statistic: 𝑠𝑠𝑎𝑎 ∙ 𝑠𝑠𝑏𝑏
𝑎𝑎 = rank(𝑥𝑥) and 𝑏𝑏 = rank(𝑦𝑦)
𝛽𝛽̂1 − 𝛽𝛽1,0
𝑡𝑡 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2] Test statistic:
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 �
𝑟𝑟𝑠𝑠 or 𝑍𝑍 = 𝑟𝑟𝑠𝑠 √𝑛𝑛 − 1 ~ N(0, 1)
(1 − 𝛼𝛼) confidence interval:
if 𝑛𝑛 > 30
𝛽𝛽1 = 𝛽𝛽̂1 ± 𝑡𝑡𝛼𝛼 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 �
2

5
Supplement 8
Overview of conditions for simple regression
Simple Linear Regression Model:
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 (𝑖𝑖 = 1, . . . , 𝑛𝑛)

conditions results
1. 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 LS coefficients are unbiased and consistent:
Least Squares estimation 2. (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) for 𝑖𝑖 = 1, . . . , 𝑛𝑛 are i.i.d. 𝑝𝑝
𝐸𝐸�𝛽𝛽̂1 � = 𝛽𝛽1 𝛽𝛽̂1 → 𝛽𝛽1
[Ch.4]
3. Large outliers are unlikely: 𝑝𝑝
𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 have nonzero finite fourth moments 𝐸𝐸�𝛽𝛽̂0 � = 𝛽𝛽0 𝛽𝛽̂0 → 𝛽𝛽0

• large 𝑛𝑛 LS coefficients are approximately jointly normally distributed


‘Modern’ (by the Central Limit Theorem)
inference
[Ch.5.1-2] heteroskedasticity-robust standard errors,
t-tests and confidence intervals are valid

4. homoscedastic errors: LS coefficients are BLUE


constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) (by the Gauss-Markov theorem) [Ch.5.5]

‘Classical’ 5. normally distributed errors: LS coefficients are jointly normally distributed


inference 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁
[Ch.5.5-6] (only relevant for small 𝑛𝑛)

homoskedasticity-only standard errors,


both 4. and 5.
t-tests and confidence intervals are valid
Week 5
Homework exercises
Exercise H.5.1
Check the following equalities:
• Example a: 𝑎𝑎𝑋𝑋1 + 𝑎𝑎𝑋𝑋2 + 𝑎𝑎𝑋𝑋3 = 𝑎𝑎(𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 )
• Example b: (𝑋𝑋1 + 𝑌𝑌1 ) + (𝑋𝑋2 + 𝑌𝑌2 ) + (𝑋𝑋3 + 𝑌𝑌3 ) = (𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 ) + (𝑌𝑌1 + 𝑌𝑌2 + 𝑌𝑌3 )
• Example c: 𝑎𝑎 + 𝑎𝑎 + 𝑎𝑎 = 𝑛𝑛𝑛𝑛
• Example d: (𝑎𝑎 + 𝑏𝑏𝑋𝑋1 + 𝑐𝑐𝑌𝑌1 ) + 𝑎𝑎 + 𝑏𝑏𝑋𝑋2 + 𝑐𝑐𝑌𝑌2 ) + (𝑎𝑎 + 𝑏𝑏𝑋𝑋3 + 𝑐𝑐𝑌𝑌3 )
= 𝑛𝑛𝑛𝑛 + 𝑏𝑏(𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 ) + 𝑐𝑐(𝑌𝑌1 + 𝑌𝑌2 + 𝑌𝑌3 )
These equalities are examples of general rules for summations that are given below in (a)-(d). Rules (a)-(c)
are basic rules and (d) is obtained using the basic rules. Derive these rules.
(a) (b)
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛

� 𝑎𝑎𝑋𝑋𝑖𝑖 = 𝑎𝑎 � 𝑋𝑋𝑖𝑖 �(𝑋𝑋𝑖𝑖 + 𝑌𝑌𝑖𝑖 ) = � 𝑋𝑋𝑖𝑖 + � 𝑌𝑌𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

(c) (d)
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛

� 𝑎𝑎 = 𝑛𝑛𝑛𝑛 �(𝑎𝑎 + 𝑏𝑏𝑋𝑋𝑖𝑖 + 𝑐𝑐𝑌𝑌𝑖𝑖 ) = 𝑛𝑛𝑛𝑛 + 𝑏𝑏 � 𝑋𝑋𝑖𝑖 + 𝑐𝑐 � 𝑌𝑌𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Exercise H.5.2
a. When you roll a common six-sided die the possible outcomes are 1, 2, 3, 4, 5, 6 eyes. Suppose
instead you use a six-sided die with an adapted number of eyes: 10, 20, 30, 40, 50, 60. When 𝑛𝑛
rolls are done, how will the mean, variance and standard deviation of the number of eyes change?
Derive the changes with formulas.
b. Suppose you use a six-sided die with an adapted number of eyes, which are: 10, 100, 1000, 10000,
100000, 1000000 (this is 101 102 103 104 105 106). Compared with the common die, how will the
sample mean, variance and standard deviation now change?

Exercise H.5.3
Suppose that for a population of individuals 𝑋𝑋 = 1 for a male and 𝑋𝑋 = 0 for a female, and 𝑌𝑌 = the
number of times the individual buys clothes in a month. The joint probability distribution of 𝑋𝑋 and 𝑌𝑌 is
given in the table below:
𝑌𝑌
0 1 2 total
0 .12 .21 .27 .6
𝑋𝑋
1 .20 .16 .04 .4
total .32 .37 .31 1
a. Calculate the fraction that buys no clothes in a month for (i) females and (ii) males.
b. Calculate 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 0).
c. Calculate 𝐸𝐸(𝑌𝑌) using the law of iterated expectations.
d. Check whether 𝑋𝑋 and 𝑌𝑌 are independent, using the results of the two previous questions.

7
Exercise H.5.4
The random variables 𝑋𝑋 and 𝑌𝑌 are yearly returns (in %) of investments funds A and B respectively. It is
known that 𝐸𝐸(𝑋𝑋) = 8, 𝐸𝐸(𝑌𝑌) = 6, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) = 47, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 31 and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 18.
One person invests €27,000 in fund A and €13,000 in fund B, and further keeps €10,000 on a bank account
with a fixed yearly return of 2%. The yield after one year of this person is then
𝑉𝑉 = 270𝑋𝑋 + 130𝑌𝑌 + 200
For another person who invests €8,000 in Fund A and €19,000 in fund B while keeping €15,000 on a bank
account with a fixed yearly return of 2%, the yield after one year is
𝑊𝑊 = 80𝑋𝑋 + 190𝑌𝑌 + 300
a. Calculate 𝐸𝐸(𝑉𝑉) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉).
b. Calculate 𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊).
c. Is 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) related to 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) ? Explain.

Exercise H.5.5
Suppose you want to use a random sample to estimate the population parameter 𝜃𝜃. For example, 𝜃𝜃
could be a population mean 𝜇𝜇 or a regression coefficient 𝛽𝛽𝑗𝑗 . You are considering to use either the
estimator 𝜃𝜃�𝐴𝐴 or the estimator 𝜃𝜃�𝐵𝐵 .
a. Suppose that 𝜃𝜃�𝐴𝐴 is an unbiased estimator, while 𝜃𝜃�𝐵𝐵 is biased. Draw an example of the probability
distributions of the two estimators, that illustrates the difference between unbiased and biased.
b. Suppose that 𝜃𝜃�𝐴𝐴 and 𝜃𝜃�𝐵𝐵 are both unbiased, while 𝜃𝜃�𝐴𝐴 is more efficient than 𝜃𝜃�𝐵𝐵 . Draw an
example of the probability distributions of the two estimators, that illustrates the difference in
efficiency.
c. Suppose that 𝜃𝜃�𝐴𝐴 is consistent, but 𝜃𝜃�𝐵𝐵 is not. Draw an example of the probability distribution of
𝜃𝜃�𝐴𝐴 for 𝑛𝑛 = 10, 100, 1000, 10000. Draw these four distributions in one graph that illustrates
consistency. Do the same for 𝜃𝜃�𝐵𝐵 thus illustrating inconsistency.

Exercise H.5.6 (continued in 5.7)


a. What is the meaning of the abbreviation: i.i.d. ?
b. When you gather cross section data of two variables, would you expect to have a case of i.i.d.?
Explain. And when you have time series data of two variables?
𝑝𝑝
c. When we write 𝑌𝑌 → 𝑎𝑎 where 𝑌𝑌 is a sample statistic, what is this called and what does it mean?
𝑝𝑝
d. For a random sample (i.i.d. variables): 𝑌𝑌� → 𝜇𝜇𝑌𝑌 (if large outliers are unlikely). What is this statement
called? What does it mean?
e. Formulate the Central Limit Theorem for the sample mean 𝑌𝑌�.

Exercise H.5.7 (continuation of 5.6)


a. For a random sample (i.i.d. variables): 𝐸𝐸(𝑌𝑌�) = 𝜇𝜇𝑌𝑌 .
(i) What is the interpretation of this?
(ii) Does this imply that 𝐸𝐸(𝑌𝑌� 2 ) = (𝜇𝜇𝑌𝑌 )2 ? Explain.
𝑝𝑝
b. For a random sample (i.i.d. variables): 𝑌𝑌� → 𝜇𝜇𝑌𝑌 (assuming large outliers are unlikely).
(i) What is the interpretation of this?
𝑝𝑝
(ii) Does this imply that 𝑌𝑌� 2 → (𝜇𝜇𝑌𝑌 )2 ? Explain.

8
Exercise H.5.8 – Proof that the sample variance is unbiased and consistent
Assume that 𝑌𝑌1 , … , 𝑌𝑌𝑛𝑛 are i.i.d. with mean 𝜇𝜇𝑌𝑌 and variance 𝜎𝜎𝑌𝑌2 . Then the sample variance
𝑛𝑛 𝑛𝑛
1 1
𝑆𝑆𝑌𝑌2 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌̄)2 = �� 𝑌𝑌𝑖𝑖2 − 𝑛𝑛𝑌𝑌̄ 2 �
𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1
can be used as an estimator of the population variance 𝜎𝜎𝑋𝑋2 .
a. Show that 𝐸𝐸�𝑌𝑌𝑖𝑖2 � = 𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 (hint: use a familiar formula)
2
𝜎𝜎
b. In a similar fashion, show that 𝐸𝐸(𝑌𝑌� 2 ) = 𝑛𝑛𝑌𝑌 + 𝜇𝜇𝑌𝑌2

c. Prove that 𝑆𝑆𝑌𝑌2 is an unbiased estimator of 𝜎𝜎𝑌𝑌2


d. Assuming that large outliers are unlikely, prove that 𝑆𝑆𝑌𝑌2 is a consistent estimator of 𝜎𝜎𝑌𝑌2

Exercise H.5.9 – Proof that the sample covariance is unbiased and consistent
Suppose the random variable pairs (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ) are i.i.d. Then the sample covariance
𝑛𝑛 𝑛𝑛
1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌��
𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1
can be used as an estimator of the population covariance 𝜎𝜎𝑋𝑋𝑋𝑋 .
a. Show that 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) = 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 (hint: use a familiar formula)
𝜎𝜎𝑋𝑋𝑋𝑋 𝜎𝜎
b. Show that 𝜎𝜎����
𝑋𝑋𝑋𝑋 = 𝑛𝑛
(hint: 𝑐𝑐𝑐𝑐𝑐𝑐�𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = ⋯ for 𝑖𝑖 ≠ 𝑗𝑗) and 𝐸𝐸(𝑋𝑋�𝑌𝑌�) = 𝑛𝑛𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
c. Prove that 𝑠𝑠𝑋𝑋𝑋𝑋 is an unbiased estimator of 𝜎𝜎𝑋𝑋𝑋𝑋
d. Assuming that large outliers are unlikely, prove that 𝑠𝑠𝑋𝑋𝑋𝑋 is a consistent estimator of 𝜎𝜎𝑋𝑋𝑋𝑋

Exercise H.5.10
A population consists for one half of zeros and for the other half of sixes, so that 𝜇𝜇 = 3 and 𝜎𝜎 2 = 9.
A random sample of size 3 is drawn from the population, yielding (𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 ). Consider three estimators of
1 2 2
the population mean: the sample mean 𝑋𝑋̄, 𝑌𝑌 = (𝑋𝑋1 + 𝑋𝑋2 )/3 and 𝑍𝑍 = 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 . 5 5 5

a. What is the bias of each estimator?


b. Which one is the most efficient?

Exercise H.5.11
From the population distribution of the random variable 𝑋𝑋 with mean 𝜇𝜇 and variance 𝜎𝜎 2 , a random
sample of size 𝑛𝑛 = 3 is drawn. Consider 𝑌𝑌 = 0.6𝑋𝑋1 + 0.6𝑋𝑋2 − 0.2𝑋𝑋3 as an estimator of 𝜇𝜇.
a. Calculate the expected value of 𝑌𝑌.
b. Calculate the variance of 𝑌𝑌.
c. Is the estimator 𝑌𝑌 unbiased? Is it efficient, compared with the sample mean 𝑋𝑋̄ ?
d. Suppose now that you have a set of quantitative data 𝑋𝑋𝑖𝑖 (𝑖𝑖 = 1, … . , 𝑛𝑛) sampled randomly from a
1
population with expected value 𝜇𝜇. We already know of 𝑋𝑋� as estimator of 𝜇𝜇: 𝑋𝑋� = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 .
1
But here, investigate this particular estimator of μ: ∑𝑛𝑛 𝑋𝑋 . Is it unbiased? Is it consistent?
𝑛𝑛−1 𝑖𝑖=1 𝑖𝑖
(Assume 𝑋𝑋𝑖𝑖 has finite fourth moments).

Exercise H.5.12
Let 𝑋𝑋 and 𝑌𝑌 denote draws from a bivariate normal distribution with 𝐸𝐸(𝑋𝑋) = 𝐸𝐸(𝑌𝑌) = 𝜇𝜇 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) =
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 1. Suppose a covariance exists between 𝑋𝑋 and 𝑌𝑌 that is equal to –0.5, so that
𝐸𝐸�(𝑋𝑋 − 𝜇𝜇)(𝑌𝑌 − 𝜇𝜇)� = −12. Consider the following two estimators of 𝜇𝜇:

9
1 1 1 2
(i) 𝑋𝑋̄ = 𝑋𝑋 + 𝑌𝑌 (ii) 𝑋𝑋� = 𝑋𝑋 + 𝑌𝑌
2 2 3 3

a. Show that 𝐸𝐸�𝑋𝑋�� = 𝜇𝜇. To what characteristic does the property 𝐸𝐸�𝑋𝑋�� = 𝜇𝜇 refer to?
b. Determine the variance of both estimators, that is 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) and 𝑣𝑣𝑣𝑣𝑣𝑣�𝑋𝑋��.
c. Which estimator is the most efficient one?

10
Week 5
Homework solutions
Solution H.5.1
a.
𝑛𝑛 𝑛𝑛

� 𝑎𝑎𝑋𝑋𝑖𝑖 = 𝑎𝑎𝑋𝑋1 + ⋯ + 𝑎𝑎𝑋𝑋𝑛𝑛 = 𝑎𝑎(𝑋𝑋1 + ⋯ + 𝑋𝑋𝑛𝑛 ) = 𝑎𝑎 � 𝑋𝑋𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1
b.
𝑛𝑛 𝑛𝑛 𝑛𝑛

�(𝑋𝑋𝑖𝑖 + 𝑌𝑌𝑖𝑖 ) = (𝑋𝑋1 + 𝑌𝑌1 ) + ⋯ + (𝑋𝑋𝑛𝑛 + 𝑌𝑌𝑛𝑛 ) = (𝑋𝑋1 + ⋯ + 𝑋𝑋𝑛𝑛 ) + (𝑌𝑌1 + ⋯ + 𝑌𝑌𝑛𝑛 ) = � 𝑋𝑋𝑖𝑖 + � 𝑌𝑌𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
c.
𝑛𝑛 𝑛𝑛

� 𝑎𝑎 = 𝑎𝑎 + ⋯ + 𝑎𝑎 = 𝑎𝑎(1 + ⋯ + 1) = 𝑎𝑎 � 1 = 𝑛𝑛𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
d. Using respectively the rules in b., c. and a., we get:
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛

�(𝑎𝑎 + 𝑏𝑏𝑋𝑋𝑖𝑖 + 𝑐𝑐𝑌𝑌𝑖𝑖 ) = � 𝑎𝑎 + � 𝑏𝑏𝑏𝑏𝑖𝑖 + � 𝑐𝑐𝑐𝑐𝑖𝑖 = 𝑛𝑛𝑛𝑛 + 𝑏𝑏 � 𝑋𝑋𝑖𝑖 + 𝑐𝑐 � 𝑌𝑌𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Solution H.5.2
a) Every possible outcome is multiplied by 10. Then the mean is also multiplied by 10, the variance is
multiplied by 102 = 100, and the standard deviation is multiplied by √100 = 10. This can be derived
as follows, where 𝑋𝑋𝑖𝑖 are the original outcomes and 𝑌𝑌𝑖𝑖 the adapted ones:

𝑌𝑌𝑖𝑖 = 10 ∙ 𝑋𝑋𝑖𝑖
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝑌𝑌̄ = 𝑛𝑛
� 𝑌𝑌𝑖𝑖 = 𝑛𝑛
� 10𝑋𝑋𝑖𝑖 = 10 ∙ 𝑛𝑛
� 𝑋𝑋𝑖𝑖 = 10 ∙ 𝑋𝑋̄
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1 1
𝑠𝑠𝑌𝑌2 = 𝑛𝑛−1
�(𝑌𝑌𝑖𝑖 − 𝑌𝑌̄)2 = 𝑛𝑛−1 �(10𝑋𝑋𝑖𝑖 − 10𝑋𝑋̄)2 = 𝑛𝑛−1 � 100(𝑋𝑋𝑖𝑖 − 𝑋𝑋̄)2 = 100 ∙ 𝑛𝑛−1 �(𝑋𝑋𝑖𝑖 − 𝑋𝑋̄)2 = 100𝑠𝑠𝑋𝑋2
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝑠𝑠𝑌𝑌 = �(𝑠𝑠𝑌𝑌2 ) = �(100 ∙ 𝑠𝑠𝑋𝑋2 ) = √100 ∙ �(𝑠𝑠𝑋𝑋2 ) = 10 ∙ 𝑠𝑠𝑋𝑋

If the question was about the population (instead of a sample of size 𝑛𝑛) then we would use:

𝑌𝑌 = 10 ∙ 𝑋𝑋 ⟹ 𝐸𝐸(𝑌𝑌) = 10 ∙ 𝐸𝐸(𝑋𝑋) = 35 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 102 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) 𝜎𝜎𝑌𝑌 = �𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 10 ∙ 𝜎𝜎𝑋𝑋

b) In this case, the transformation is not linear: 𝑌𝑌𝑖𝑖 = 10 𝑋𝑋𝑖𝑖 . To calculate what happens with the sample
mean 𝑌𝑌̄, variance 𝑠𝑠𝑌𝑌2 and standard deviation 𝑠𝑠𝑌𝑌 , we need the actual 𝑛𝑛 observations.
For the same reason of nonlinearity of 𝑌𝑌 = 10 𝑋𝑋 we cannot use equations like 𝐸𝐸(𝑎𝑎𝑎𝑎 + 𝑏𝑏) = 𝑎𝑎𝑎𝑎(𝑋𝑋) +
𝑏𝑏 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑎𝑎𝑎𝑎 + 𝑏𝑏) = 𝑎𝑎2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋). Using 𝐸𝐸(𝑎𝑎 𝑋𝑋 ) = 𝑎𝑎𝐸𝐸(𝑋𝑋) will fail, because it is false.
We would have to calculate, for example:
6
𝑋𝑋 ) 1
𝐸𝐸(10 = � 10𝑥𝑥 𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = 6(10 + 100 + ⋯ + 1000000) = 185 185.19
𝑥𝑥=1
𝐸𝐸(𝑋𝑋)
�note: 10 = 103.5 = 3162.28 ≠ 𝐸𝐸(10 𝑋𝑋 ) = 185 185.19 so 𝐸𝐸(𝑎𝑎 𝑋𝑋 ) ≠ 𝑎𝑎𝐸𝐸(𝑋𝑋) �

11
Solution H.5.3
0.12
a. 𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 0) = = 0.2, so 20% for females
0.6
0.20
𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 1) = = 0.5, so 50% for males
0.4
0.12
b. 𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 0) = = 0.2
0.6
0.21
𝑃𝑃(𝑌𝑌 = 1 | 𝑋𝑋 = 0) = = 0.35
0.6
0.26
𝑃𝑃(𝑌𝑌 = 2 | 𝑋𝑋 = 0) = = 0.45
0.6
𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 0 ∙ 0.2 + 1 ∙ 0.35 + 2 ∙ 0.45 = 1.25
𝐸𝐸(𝑌𝑌 2 | 𝑋𝑋 = 0) = 02 ∙ 0.2 + 12 ∙ 0.35 + 22 ∙ 0.45 = 2.15
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 0) = 𝐸𝐸(𝑌𝑌 2 | 𝑋𝑋 = 0) − [𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0)]2 = 2.15 − 1.252 = 0.5875
c. 𝐸𝐸(𝑌𝑌) = 𝐸𝐸�𝐸𝐸(𝑌𝑌|𝑋𝑋)� = 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) ∙ 𝑃𝑃(𝑋𝑋 = 0) + 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) ∙ 𝑃𝑃(𝑋𝑋 = 1)
= 1.25 ∙ 0.6 + 0.6 ∙ 0.4 = 0.99
0.20 0.16 0.04
note: 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) = 0 ∙ + 1∙ +2∙ = 0.6
0.4 0.4 0.4
d. Dependent, because 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 1.25 is not equal to 𝐸𝐸(𝑌𝑌) = 0.99.
Solution H.5.4
a. 𝑉𝑉 = 270𝑋𝑋 + 130𝑌𝑌 + 200
𝐸𝐸(𝑉𝑉) = 270 ∙ 𝐸𝐸(𝑋𝑋) + 130 ∙ 𝐸𝐸(𝑌𝑌) + 200 = 270 ∙ 8 + 130 ∙ 6 + 200 = 3 140
𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉) = 2702 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 1302 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 × 270 × 130 × 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
= 2702 ∙ 47 + 1302 ∙ 31 + 2 ∙ 270 ∙ 130 ∙ 18 = 5 213 800
b. 𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = 𝑐𝑐𝑐𝑐𝑐𝑐(270 ∙ 𝑋𝑋 + 130 ∙ 𝑌𝑌 + 200, 80 ∙ 𝑋𝑋 + 190 ∙ 𝑌𝑌 + 300)
= 270 ∙ 80 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑋𝑋) + 270 ∙ 190 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) + 130 ∙ 80 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌, 𝑋𝑋) + 130 ∙ 190 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌, 𝑌𝑌)
= 270 ∙ 80 ∙ 47 + 270 ∙ 190 ∙ 18 + 130 ∙ 80 ∙ 18 + 130 ∙ 190 ∙ 31 = 2 891 500
c. If V was a linear transformation of only 𝑋𝑋 and 𝑊𝑊 was a linear transformation of only 𝑌𝑌, then we
could immediately say that 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌). But that is not the case here.
Here, we need to calculate:
𝑊𝑊 = 80𝑋𝑋 + 190𝑌𝑌 + 300
𝑣𝑣𝑣𝑣𝑣𝑣(𝑊𝑊) = 802 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 1902 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 × 80 × 190 × 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
= 802 ∙ 47 + 1902 ∙ 31 + 2 ∙ 80 ∙ 190 ∙ 18 = 1 967 100
𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = = 0.903 ≠ 0.472 = = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
�𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉) ∙ �𝑣𝑣𝑣𝑣𝑣𝑣(𝑊𝑊) �𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) ∙ �𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌)

Solution H.5.5
a. b.

12
c.

Note that for consistency, it is not required that the estimator is unbiased; it is sufficient that the estimator
is asymptotically unbiased.

Solution H.5.6
a. Independently and identically distributed.
b. In case of bivariate cross-sectional data (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ), the 𝑛𝑛 objects for which the two
variables (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are measured are obtained by repeated independent random draws from the same
population, so that you get a sequence of i.i.d. pairs of random variables.
Time-series data belong to consecutive time periods (or moments) in time, which are thus not drawn
independently, and as a result the consecutive pairs of random variables (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ) are likely
to be dependent.
c. We say that 𝑌𝑌 converges in probability to 𝑎𝑎. Formally it means: when the sample size 𝑛𝑛 increases to
infinity, then 𝑃𝑃(𝑎𝑎 − 𝑐𝑐 < 𝑌𝑌 < 𝑎𝑎 + 𝑐𝑐) converges to 1 for any given value 𝑐𝑐 > 0.
In words: when 𝑛𝑛 increases to infinity, the probability that 𝑌𝑌 will be arbitrarily close to 𝑎𝑎 (as close as
you like) increases to 1. Or: when 𝑛𝑛 increases to infinity, then asymptotically 𝑌𝑌 will deviate only an
arbitrarily small amount from 𝑎𝑎.
Effectively: when 𝑛𝑛 increases to infinity the probability distribution of 𝑌𝑌 will become fully
concentrated at 𝑎𝑎.
d. The statement is called the Law of Large Numbers. It says that the sample mean converges in
probability to the population mean. (Convergence in probability is explained in the previous question).
e. If 𝑌𝑌1 , … , 𝑌𝑌𝑛𝑛 are i.i.d. and each 𝑌𝑌𝑖𝑖 has mean 𝜇𝜇𝑌𝑌 and positive finite variance 𝜎𝜎𝑌𝑌2 , then for increasing
values of 𝑛𝑛 (up to infinity) the distribution of
𝑌𝑌� − 𝜇𝜇𝑌𝑌
𝜎𝜎𝑌𝑌
√𝑛𝑛
becomes arbitrarily well approximated by the standard normal distribution (and for increasing values of
𝑛𝑛 the distribution of 𝑌𝑌� becomes arbitrarily well approximated by the normal distribution with mean
𝜎𝜎
𝜇𝜇𝑌𝑌 and standard deviation 𝑌𝑌𝑛𝑛 ).

13
Solution H.5.7
a. (i) Interpretation: 𝑌𝑌� is an unbiased estimator of 𝜇𝜇𝑌𝑌 . Over repeated samples the estimated value is on
average correct, so there is no systematic over- or underestimation.
(ii) It is not implied that 𝑌𝑌� 2 has expected value (𝜇𝜇𝑌𝑌 )2 because an expected value can only be
transferred with linear functions, but not with nonlinear functions such as a quadratic function.
Consequently, 𝑌𝑌� 2 is not an unbiased estimator of (𝜇𝜇𝑌𝑌 )2 .
𝑝𝑝
b. (i) Interpretation of 𝑌𝑌� → 𝜇𝜇𝑌𝑌 : 𝑌𝑌� converges in probability to 𝜇𝜇𝑌𝑌 so when 𝑛𝑛 increases to infinity, then
asymptotically 𝑌𝑌� will deviate only an arbitrarily small amount from 𝜇𝜇𝑌𝑌 . This means that 𝑌𝑌� is a
consistent estimator of 𝜇𝜇𝑌𝑌 .
𝑝𝑝
(ii) It is indeed implied that 𝑌𝑌� 2 → (𝜇𝜇𝑌𝑌 )2 as continuous functions preserve convergence in probability,
and a quadratic function is indeed continuous. Consequently 𝑌𝑌� 2 is a consistent estimator of (𝜇𝜇𝑌𝑌 )2 .

Solution H.5.8
a. 𝜎𝜎𝑌𝑌2 = 𝐸𝐸�𝑌𝑌𝑖𝑖2 � − 𝜇𝜇𝑌𝑌2 ⟹ 𝐸𝐸�𝑌𝑌𝑖𝑖2 � = 𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2
2 2
𝜎𝜎 𝜎𝜎
b. 𝜎𝜎𝑌𝑌�2 = 𝐸𝐸(𝑌𝑌� 2 ) − 𝜇𝜇𝑌𝑌2� ⟹ 𝐸𝐸(𝑌𝑌� 2 ) = 𝜎𝜎𝑌𝑌�2 + 𝜇𝜇𝑌𝑌2� = 𝑌𝑌 + 𝜇𝜇𝑌𝑌2 because 𝜎𝜎𝑌𝑌�2 = 𝑌𝑌 and 𝜇𝜇𝑌𝑌� = 𝜇𝜇𝑌𝑌
𝑛𝑛 𝑛𝑛

c.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝐸𝐸(𝑆𝑆𝑌𝑌2 ) = 𝐸𝐸 �𝑛𝑛−1 �� 𝑌𝑌𝑖𝑖2 ̄2
− 𝑛𝑛𝑌𝑌 �� = �𝐸𝐸 �� 𝑌𝑌𝑖𝑖2 � − 𝐸𝐸(𝑛𝑛𝑌𝑌̄ 2 )� = �� 𝐸𝐸�𝑌𝑌𝑖𝑖2 �−𝑛𝑛𝑛𝑛(𝑌𝑌̄ 2 )�
𝑛𝑛−1 𝑛𝑛−1
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
1 𝜎𝜎𝑌𝑌2 1 1
= 𝑛𝑛−1 ��(𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 ) − 𝑛𝑛 � + 𝜇𝜇𝑌𝑌2 �� = 𝑛𝑛−1(𝑛𝑛𝑛𝑛𝑌𝑌2 + 𝑛𝑛𝜇𝜇𝑌𝑌2 − 𝜎𝜎𝑌𝑌2 − 𝑛𝑛𝜇𝜇𝑌𝑌2 ) = 𝑛𝑛−1 ∙ (𝑛𝑛 − 1)𝜎𝜎𝑌𝑌2 = 𝜎𝜎𝑌𝑌2
𝑛𝑛
𝑖𝑖=1

d. Based on the law of large numbers, we have


𝑛𝑛 𝑛𝑛
1 𝑝𝑝 1 𝑝𝑝
𝑌𝑌� = 𝑛𝑛
� 𝑌𝑌
𝑖𝑖 �� 𝐸𝐸(𝑌𝑌𝑖𝑖 ) = 𝜇𝜇𝑌𝑌 and 𝑛𝑛
� 𝑌𝑌
𝑖𝑖
2
�� 𝐸𝐸�𝑌𝑌𝑖𝑖2 � = 𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2
𝑖𝑖=1 𝑖𝑖=1

In turn,
𝑛𝑛 𝑛𝑛
1 𝑛𝑛 1 𝑝𝑝
𝑆𝑆𝑌𝑌2 = �� 𝑌𝑌𝑖𝑖2 − 𝑛𝑛𝑌𝑌̄ 2 � = � � 𝑌𝑌𝑖𝑖2 − 𝑌𝑌̄ 2 � �� 1 ∙ (𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 − 𝜇𝜇𝑌𝑌2 ) = 𝜎𝜎𝑌𝑌2
𝑛𝑛−1 𝑛𝑛 − 1 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
since → 1 for 𝑛𝑛 → ∞ and continuous functions preserve convergence in probability.
𝑛𝑛−1

Solution H.5.9
a. 𝜎𝜎𝑋𝑋𝑋𝑋 = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) − 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ⟹ 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) = 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌

b. Since 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑗𝑗 are independent for 𝑖𝑖 ≠ 𝑗𝑗, we have 𝑐𝑐𝑐𝑐𝑐𝑐�𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = 0 for 𝑖𝑖 ≠ 𝑗𝑗 so that
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1 1 1 1 𝜎𝜎𝑋𝑋𝑋𝑋
𝜎𝜎𝑋𝑋𝑋𝑋
���� = 𝑐𝑐𝑐𝑐𝑐𝑐 �𝑛𝑛 � 𝑋𝑋𝑖𝑖 , 𝑛𝑛 � 𝑌𝑌𝑗𝑗 � = ∙ ∙ 𝑐𝑐𝑐𝑐𝑐𝑐 �� 𝑋𝑋𝑖𝑖 , � 𝑌𝑌𝑗𝑗 � = 𝑛𝑛2
� 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) = 𝑛𝑛2 ∙ 𝑛𝑛 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 =
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1

As a result,
𝜎𝜎𝑋𝑋𝑋𝑋
���� = 𝐸𝐸(𝑋𝑋
𝜎𝜎𝑋𝑋𝑋𝑋 � 𝑌𝑌�) − 𝜇𝜇𝑋𝑋� 𝜇𝜇𝑌𝑌� ⟹ 𝐸𝐸(𝑋𝑋�𝑌𝑌�) = 𝜎𝜎𝑋𝑋𝑋𝑋
���� + 𝜇𝜇𝑋𝑋� 𝜇𝜇𝑌𝑌� = + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
𝑛𝑛
because 𝜇𝜇𝑋𝑋� = 𝜇𝜇𝑋𝑋 and 𝜇𝜇𝑌𝑌� = 𝜇𝜇𝑌𝑌 .

14
c.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝐸𝐸(𝑠𝑠𝑋𝑋𝑋𝑋 ) = 𝐸𝐸 �𝑛𝑛−1 �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌��� = 𝑛𝑛−1 �𝐸𝐸 �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 � − 𝐸𝐸(𝑛𝑛𝑋𝑋�𝑌𝑌�)� = 𝑛𝑛−1 �� 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) − 𝑛𝑛𝑛𝑛(𝑋𝑋�𝑌𝑌�)�
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
1 𝜎𝜎𝑋𝑋𝑋𝑋 1 (𝑛𝑛−1)𝜎𝜎
��(𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) − 𝑛𝑛 � + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 �� = 𝑛𝑛−1(𝑛𝑛𝑛𝑛𝑋𝑋𝑋𝑋 + 𝑛𝑛𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 − 𝜎𝜎𝑋𝑋𝑋𝑋 − 𝑛𝑛𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) = 𝑛𝑛−1 𝑋𝑋𝑋𝑋 = 𝜎𝜎𝑋𝑋𝑋𝑋
𝑛𝑛−1 𝑛𝑛
𝑖𝑖=1

d. Based on the law of large numbers, we have


𝑛𝑛 𝑛𝑛
1 𝑝𝑝 𝑝𝑝 1 𝑝𝑝
𝑋𝑋� = 𝑛𝑛 � 𝑋𝑋𝑖𝑖 �� 𝐸𝐸(𝑋𝑋𝑖𝑖 ) = 𝜇𝜇𝑋𝑋 and 𝑌𝑌� �� 𝜇𝜇𝑌𝑌 and 𝑛𝑛
� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 �� 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) = 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
𝑖𝑖=1 𝑖𝑖=1

In turn,
𝑛𝑛 𝑛𝑛
1 𝑛𝑛 1 𝑃𝑃
𝑆𝑆𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌�� = � � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑋𝑋�𝑌𝑌�� �� 1 ∙ (𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 − 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) = 𝜎𝜎𝑋𝑋𝑋𝑋
𝑛𝑛−1 𝑛𝑛 − 1 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
since → 1 for 𝑛𝑛 → ∞ and continuous functions preserve convergence in probability.
𝑛𝑛−1

Solution H.5.10
a. 𝐸𝐸(𝑋𝑋̄) = 𝜇𝜇 is known, so the bias is 0.
𝐸𝐸(𝑋𝑋1 )+𝐸𝐸(𝑋𝑋2 ) 𝜇𝜇+𝜇𝜇 2 2 1
𝐸𝐸(𝑌𝑌) =
3
= 3 = 𝜇𝜇 so the bias is 𝜇𝜇 − 𝜇𝜇 = − 𝜇𝜇
3 3 3
1 2 2 1 2 2
𝐸𝐸(𝑍𝑍) = 𝐸𝐸(𝑋𝑋1 ) + 𝐸𝐸(𝑋𝑋2 ) + 𝐸𝐸(𝑋𝑋3 ) = 𝜇𝜇 + 𝜇𝜇 + 𝜇𝜇 = 𝜇𝜇 so the bias is 0.
5 5 5 5 5 5

b. We only need to compare the variances of the unbiased estimators.


𝜎𝜎 2
𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) = is known.
3

1 2 1 1 2 2 2 2 2
𝑣𝑣𝑣𝑣𝑣𝑣(𝑍𝑍) = 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋1 � + 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋2 � + 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋3 � = �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋1 ) + �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋2 ) + �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋3 )
1 4 4 9
= 25𝜎𝜎 2 + 25𝜎𝜎 2 + 25𝜎𝜎 2 = 25𝜎𝜎 2
1
3
𝜎𝜎 2 < 25
9 2
𝜎𝜎 so 𝑋𝑋̄ is more efficient than 𝑍𝑍.

Solution H.5.11
a. 𝐸𝐸(𝑌𝑌) = 0.6 ∙ 𝐸𝐸(𝑋𝑋1 ) + 0.6 ∙ 𝐸𝐸(𝑋𝑋2 ) − 0.2 ∙ 𝐸𝐸(𝑋𝑋3 ) = 0.6𝜇𝜇 + 0.6𝜇𝜇 − 0.2𝜇𝜇 = 𝜇𝜇
b. 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 𝑣𝑣𝑣𝑣𝑣𝑣(0.6 ∙ 𝑋𝑋1 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(0.6 ∙ 𝑋𝑋2 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(−0.2 ∙ 𝑋𝑋3 ) since the three terms are mutually
independent.
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0. 62 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋1 ) + 0. 62 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋2 ) + (−0.2)2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋3 ) = 0.36𝜎𝜎 2 + 0.36𝜎𝜎 2 + 0.04𝜎𝜎 2 = 0.76𝜎𝜎 2
c. 𝑌𝑌 is unbiased, since 𝐸𝐸(𝑌𝑌) = 𝜇𝜇.
𝑌𝑌 is not efficient compared with 𝑋𝑋̄ since 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0.76𝜎𝜎 2 > 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) = 13𝜎𝜎 2
d.
𝑛𝑛 𝑛𝑛
1 1 1 1 𝑛𝑛
𝐸𝐸 � � 𝑋𝑋𝑖𝑖 � = � 𝐸𝐸(𝑋𝑋𝑖𝑖 ) = (𝜇𝜇 + ⋯ + 𝜇𝜇) = 𝑛𝑛 ⋅ 𝜇𝜇 = 𝑛𝑛−1 ⋅ 𝜇𝜇
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖−1 𝑖𝑖−1

≠ 𝜇𝜇 so it is biased.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 2 1
𝑣𝑣𝑣𝑣𝑣𝑣 � � 𝑋𝑋𝑖𝑖 � = � � ∙ 𝑣𝑣𝑣𝑣𝑣𝑣 �� 𝑋𝑋𝑖𝑖 � = ∙ � 𝑣𝑣𝑣𝑣𝑣𝑣( 𝑋𝑋𝑖𝑖 )
𝑛𝑛 − 1 𝑛𝑛 − 1 (𝑛𝑛 − 1)2
𝑖𝑖−1 𝑖𝑖−1 𝑖𝑖−1

15
1 1
= 2
∙ (𝜎𝜎 2 +. . . +𝜎𝜎 2 ) = ∙ 𝑛𝑛 ⋅ 𝜎𝜎 2
(𝑛𝑛 − 1) (𝑛𝑛 − 1)2
𝑛𝑛 1
When 𝑛𝑛 increases to infinity, then the expected value ⋅ 𝜇𝜇 = 1 ∙ 𝜇𝜇 converges to 𝜇𝜇 while the
𝑛𝑛−1 1−
𝑛𝑛
1 1
variance ∙ 𝑛𝑛 ⋅ 𝜎𝜎 2 = 1 ∙ 𝜎𝜎 2 converges to 0. This means that the estimator is consistent –
(𝑛𝑛−1)(𝑛𝑛−1) (𝑛𝑛−1)(1− )
𝑛𝑛
even though it is biased.

Solution H.5.12
1 2 1 2
a) 𝐸𝐸�𝑋𝑋�� = 𝐸𝐸 � 𝑋𝑋 + 𝑌𝑌� = 𝜇𝜇 + 𝜇𝜇 = 𝜇𝜇 so 𝑋𝑋� is an unbiased estimator
3 3 3 3
1 2 1 2
b) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) = �2� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + �2� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 ∙ 12 ∙ 12 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 14 ∙ 1 + 14 ∙ 1 − 14 = 14
1 2 2 2 1 2 1 4 2 3
𝑣𝑣𝑣𝑣𝑣𝑣�𝑋𝑋�� = �3� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + �3� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 ∙ 3 ∙ 3 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 9 ∙ 1 + 9 ∙ 1 − 9 = 9

c) 𝑣𝑣𝑣𝑣𝑣𝑣�𝑋𝑋�� > 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) so 𝑋𝑋̄ is more efficient

16
Week 5
Tutorial exercises
Question T.5.1
An airline company has a file containing specific information with respect to all their customers. Based on
that file, the following relative frequency table is determined, where for a given year:
𝑋𝑋 = the number of private flights of the customer
𝑌𝑌 = the number of business flights of the customer

𝑌𝑌
0 1 2 3 total
0 0.00 0.24 0.11 0.05 0.40
𝑋𝑋 1 0.23 0.15 0.08 0.04 0.50
2 0.07 0.02 0.01 0.00 0.10
total 0.30 0.41 0.20 0.09 1.00

Assume this table represents the population distribution of (𝑋𝑋, 𝑌𝑌) when a random customer is drawn. One
can calculate from the table:
𝐸𝐸(𝑋𝑋) = 0.7 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) = 0.41 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = −0.246
𝐸𝐸(𝑌𝑌) = 1.08 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0.8536 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 1.525 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) = 0.86
a. Give the probability distribution of the number of business flights, when it is given that the
customer took 2 private flights.
b. Are 𝑋𝑋 and 𝑌𝑌 independent? Show it.
c. Determine the expected number of business flights, when it is given that the customer has 2
private flights. (Always use the proper notation!). How does your answer provide information
about (in)dependence of 𝑋𝑋 and 𝑌𝑌 ?
d. Determine the standard deviation of the number of business flights, when it is given that the
customer took 2 private flights.
e. Show the calculation of 𝐸𝐸(𝑌𝑌) according to the law of iterated expectations.
Suppose the revenue generated by the private flights of a customer is 𝐺𝐺 = 142 ∙ 𝑋𝑋 − 21 euro, while for
the business flights this revenue is 𝐻𝐻 = 197 ∙ 𝑌𝑌 − 34 euro.
f. Calculate 𝑣𝑣𝑣𝑣𝑣𝑣(𝐺𝐺) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝐻𝐻).
g. Calculate the 𝑐𝑐𝑐𝑐𝑐𝑐(𝐺𝐺, 𝐻𝐻).
h. Calculate the 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝐺𝐺, 𝐻𝐻) and compare it with 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌).

Question T.5.2
a. Suppose the random variables 𝑋𝑋 and 𝑌𝑌 are uncorrelated. Are 𝑋𝑋 and 𝑌𝑌 independent?
b. Is the sample variance 𝑆𝑆 2 of i.i.d. variables an unbiased and consistent estimator of the population
variance 𝜎𝜎 2 when large outliers are unlikely? (for a proof: see H.5.8) Express both properties with
formulas and interpret.
c. Is it implied that 𝑆𝑆 is an unbiased and consistent estimator of 𝜎𝜎 ?
d. Is the sample covariance 𝑆𝑆𝑋𝑋𝑋𝑋 of i.i.d. pairs of variables an unbiased and consistent estimator of the
population covariance 𝜎𝜎𝑋𝑋𝑋𝑋 when large outliers are unlikely? (for a proof: see H.5.9) And is the

17
sample correlation 𝑟𝑟𝑋𝑋𝑋𝑋 then an unbiased and consistent estimator of the population covariance
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) ?

Question T.5.3
Consider the model 𝑌𝑌𝑖𝑖 = 𝛼𝛼 + 𝑢𝑢𝑖𝑖 where 𝛼𝛼 is a parameter and 𝑢𝑢𝑖𝑖 is a random error term. (This is a linear
regression model with only a constant) Assume that 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2 and that the errors are
independent.
a. Determine the least squares estimator of 𝛼𝛼.
b. Prove that the least squares estimator of 𝛼𝛼 is unbiased.
c. Calculate the (population) variance of the least squares estimator of 𝛼𝛼.
d. Show that the least squares estimator of 𝛼𝛼 is consistent.
e. What is the distribution of the least squares estimator of 𝛼𝛼 ?
(𝑌𝑌 + 𝑌𝑌 )
f. Consider the alternative estimator: 𝑎𝑎 ∗ = 1 2 2 . Prove it is unbiased and calculate its population
variance. Is this estimator consistent and efficient?
g. Is unbiased a stronger property than consistent? Or vice versa?

Question T.5.4
Suppose the following commands are given in STATA in an empty data file:
set obs 1000
generate x=2+4*rnormal()
gen y =3+5*x+7*rnormal()
These commands generate a random sample of (𝑥𝑥, 𝑦𝑦).
a. What is the population distribution of 𝑋𝑋 ?
b. What is the population distribution of 𝑌𝑌 ?
c. What is the population covariance of 𝑋𝑋 and 𝑌𝑌?
d. What type of distribution does (𝑋𝑋, 𝑌𝑌) have?

18
Week 6
Homework exercises
Exercise H.6.1 (continued in H.7.1)
Fifteen observations were taken to estimate a simple regression model. These summations were produced:

� 𝑋𝑋𝑖𝑖 = 50 � 𝑋𝑋𝑖𝑖2 = 250 � 𝑌𝑌𝑖𝑖 = 100 � 𝑌𝑌𝑖𝑖2 = 1100 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 500

a. Find the least squares regression line. Interpret the coefficients.


b. What is the regression’s prediction of 𝑌𝑌 for 𝑋𝑋 = 2.9 ?
c. Calculate SSR, 𝑠𝑠𝑢𝑢�2 and 𝑠𝑠𝑢𝑢� . Interpret 𝑠𝑠𝑢𝑢�
d. Calculate the coefficient of determination and interpret it.
e. Complete the ANOVA table:

ANOVA Sum of Squares df Mean Square


Explained
Residuals
Total
𝑅𝑅 2

Exercise H.6.2 (continued in H.7.2)


The manager of a large furniture store wanted to determine the effectiveness of her advertising. The
furniture store regularly runs several ads per month in the local newspaper. The manager wanted to know
if the number of ads influenced the number of customers. During the past eight months, she kept track of
both figures:
X Y
month
number of ads number of customers
1 5 528
2 12 876
3 8 653
4 6 571
5 4 556
6 15 1058
7 10 963
8 7 719
a. Find the equation of the regression line. Interpret the coefficients.
b. Calculate and interpret the standard error of the regression.
c. Compute and interpret the coefficient of determination.
d. Summarize the sums of squares etc. in an ANOVA table

Exercise H.6.3 (continued in H.7.3)


Physicians have been recommending more exercise to their patients, particularly those who are
overweight. One benefit of regular exercise appears to be a reduction in cholesterol, a substance
associated with heart disease. To study the relationship more carefully, a physician took a random sample
of 50 patients who do not exercise. She measured their cholesterol levels. She then started them on regular
exercise programs. After 4 months, she asked each patient how many minutes per week (on average) he or

19
she exercised and she also measured his or her cholesterol levels. The results have been analyzed in SPSS
using the following variable:
EXERCISE = weekly exercises in minutes
BEFORE = cholesterol level before exercise program
AFTER = cholesterol level after exercise program
The reduction in cholesterol is calculated as: RED = BEFORE − AFTER
REGRESSION
Variables Entered/Removedb

Variables Variables
Model Entered Removed Method
1 EXERCISEa . Enter
a. All requested variables entered.
b. Dependent Variable: RED

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .714a .510 .499 10.5293
a. Predictors: (Constant), EXERCISE

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression ..... 1 ..... ..... .000a
Residual ..... 48 .....
Total ..... 49
a. Predictors: (Constant), EXERCISE
b. Dependent Variable: RED
a
Coefficients

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) ..... 3.939 ..... .605
EXERCISE ..... ..... .714 ..... .000
a. Dependent Variable: RED
CORRELATIONS
Descriptive Statistics

Mean Std. Deviation N


EXERCISE 283.1400 116.7960 50
RED 27.8000 14.8805 50

20
Correlations

EXERCISE RED
EXERCISE Pearson Correlation 1.000 .714**
Sig. (2-tailed) . .000
Sum of Squares and
668424.02 60789.400
Cross-products
Covariance 13641.307 1240.600
N 50 50
RED Pearson Correlation .714** 1.000
Sig. (2-tailed) .000 .
Sum of Squares and
60789.400 10850.000
Cross-products
Covariance 1240.600 221.429
N 50 50
**. Correlation is significant at the 0.01 level (2-tailed).

a. Determine the sample regression line. Interpret the coefficients.


b. Make an ANOVA table.
c. Give an interpretation of 𝑅𝑅 2 and 𝑠𝑠𝜀𝜀 .

Question H.6.4 (S&W 4.2, modified)


A random sample of 100 20-year-old men is selected from a population and these men’s height and weight
are recorded. A regression of weight on height yields
� = −79.24 + 4.16 ∙ Height,
Weıght 𝑅𝑅 2 = 0.72, 𝑆𝑆𝑆𝑆𝑆𝑆 = 12.6
where Weight is measured in pounds and Height is measured in inches.
a. What is the regression’s weight prediction for someone who is 64 inches tall? 68 inches tall? 72
inches tall?
b. A man has a late growth spurt and grows 2 inches over the course of a year. What is the
regression’s prediction for the increase in this man’s weight?
c. Suppose that instead of measuring weight and height in pounds (𝑙𝑙𝑙𝑙) and inches (𝑖𝑖𝑖𝑖), these variables
are measured in centimeters (𝑐𝑐𝑐𝑐) and kilograms (𝑘𝑘𝑘𝑘), where
1 𝑖𝑖𝑖𝑖 = 2.54 𝑐𝑐𝑐𝑐 and 1 lb = 0.4536 kg
What are the regression estimates from this new centimeter–kilogram regression? (Give all results,
estimated coefficients, 𝑅𝑅 2, and 𝑆𝑆𝑆𝑆𝑆𝑆.

H.6.5 (S&W 4.3, modified)


A regression of average monthly expenditure (𝐴𝐴𝐴𝐴𝐴𝐴) on average monthly income (𝐴𝐴𝐴𝐴𝐴𝐴) using a random
sample of college educated full-time workers earning €100 to €1.5 million yields the following:

� = 710.7 + 8.8 𝐴𝐴𝐴𝐴𝐴𝐴, 𝑅𝑅 2 = 0.030, 𝑆𝑆𝑆𝑆𝑆𝑆 = 540.30


𝐴𝐴𝐴𝐴𝐴𝐴

a) Explain what the coefficient values 710.7 and 8.8 mean


b) The standard error of the regression (𝑆𝑆𝑆𝑆𝑆𝑆) is 540.30. What are the units of measurement for the
𝑆𝑆𝑆𝑆𝑆𝑆? (Euros? Or is it unit free?)
c) The regression 𝑅𝑅 2 is 0.030. What are the units of measurement for the 𝑅𝑅 2? (Euros? Or is 𝑅𝑅 2 unit
free?

21
d) What does the regression predict will be the expenditure of a person with an income of €100? With
an income of €200?
e) Will the regression give reliable predictions for a person with an income of €2 million? Why or why
not?
f) Given what you know about the distribution of earnings, do you think it is plausible that the distri-
bution of errors in the regression is normal? (Hint: Do you think that the distribution is symmetric
or skewed? What is the smallest value of earnings, and is it consistent with a normal distribution?)

H.6.6 (S&W 4.5, modified)


A researcher runs an experiment to measure the impact of a short nap on memory. There are 200 partici-
pants and they can take a short nap of either 60 minutes or 75 minutes. After waking up, each participant
takes a short test for short-term recall.
Each participant is randomly assigned one of the examination times, based on the flip of a coin. Let 𝑌𝑌𝑖𝑖 de-
note the number of points scored on the test by the 𝑖𝑖th participant (0 ≤ 𝑌𝑌𝑖𝑖 ≤ 100), let 𝑋𝑋𝑖𝑖 denote
the amount of time for which the participant slept prior to taking the test (𝑋𝑋𝑖𝑖 = 60 or 𝑋𝑋𝑖𝑖 = 75), and con-
sider the regression model
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖

a) Explain what the term 𝑢𝑢𝑖𝑖 represents. Why will different participants have different values of 𝑢𝑢𝑖𝑖 ?
b) What is 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) ? Are the estimated coefficients unbiased?
c) The estimated regression is
𝑌𝑌𝑖𝑖 = 55 + 0.17𝑋𝑋𝑖𝑖
Compute the estimated gain in score for a participant who is given an additional 5 minutes to nap.

H.6.7 (midterm 27-09-2012, modified)


In a French study the relation between the average temperature 𝐶𝐶 (in oC) and the average price of a box of
Merlot wine 𝐸𝐸 (in €) was estimated, giving
𝐸𝐸�𝑖𝑖 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝐶𝐶𝑖𝑖 = −4.32 + 2.16𝐶𝐶𝑖𝑖
An economist in the US wants to express this relation after converting the temperature into oF and the
price into $. Defining 𝐹𝐹 = 32 + 1.8𝐶𝐶 and 𝐷𝐷 = 1.25𝐸𝐸, the converted relation is
�í = 𝛾𝛾�0 + 𝛾𝛾�1 𝐹𝐹𝑖𝑖 = −53.4 + 1.5𝐹𝐹𝑖𝑖
𝐷𝐷
The economist in the US wonders whether he would have obtained the same values of 𝛾𝛾�0 and 𝛾𝛾�1 if he
converted the original data for 𝐸𝐸 and 𝐶𝐶 into data for 𝐷𝐷 and 𝐹𝐹, and next estimated the model with the
converted data. Show that this would indeed lead to the same result.

Exercise H.6.8 (resit 30-6-2021 Q2a(iii,iv))


In STATA you give the following commands:
set obs 30
gen x=5+2*rnormal()
gen u=7*rnormal()
gen y=60-3*x+u
Find 𝐸𝐸(𝑌𝑌), 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌), 𝐸𝐸(𝑌𝑌|𝑋𝑋) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌|𝑋𝑋).

Exercise H.6.9 (endterm 20-12-2021 Q3)


a. Show that the condition 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 implies that 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑢𝑢𝑖𝑖 ) = 0.
b. The theoretical relationship between two variables 𝑋𝑋 and 𝑌𝑌 is given by the model
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖

22
which satisfies the LS assumptions. Unfortunately, the role of the two variables was confused and
the following sample regression function was estimated using the least squares estimator:
𝑋𝑋𝑖𝑖 = 𝛾𝛾0 + 𝛾𝛾1 𝑌𝑌𝑖𝑖 + 𝜈𝜈𝑖𝑖
1 1
Show with a formula how �1
is calculated and find the value to which �1
converges in probability.
𝛾𝛾 𝛾𝛾
1
Also prove that �1
is an inconsistent estimator of 𝛽𝛽1 .
𝛾𝛾

Exercise H.6.10 (midterm 24-9-2018 Q3, modified)


Consider the model
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 for 𝑖𝑖 = 1, … , 𝑛𝑛
This model satisfies the LS-assumptions (Key Concept 4.3) and 𝑛𝑛 is large.
Rather than the the least squares estimators 𝛽𝛽̂1 , an alternative estimator of 𝛽𝛽1 is defined by
𝑠𝑠𝑋𝑋𝑋𝑋 1 1
𝑏𝑏1 = ̂
2 + 𝑛𝑛 = 𝛽𝛽1 + 𝑛𝑛
𝑠𝑠𝑋𝑋
a. Derive 𝐸𝐸(𝑏𝑏1 ) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑏𝑏1 ) where you may use 𝐸𝐸(𝛽𝛽̂1 ) = 𝛽𝛽1 and the formula of 𝑣𝑣𝑎𝑎𝑎𝑎�𝛽𝛽̂1 �.
b. Check mathematically whether 𝑏𝑏1 is unbiased, and also whether 𝑏𝑏1 is consistent.

23
Week 6
Homework solutions
Formulas
Sample variances and sample covariances can be calculated by their defining formulas:
1 1
𝑠𝑠𝑋𝑋2 =∙ �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 and 𝑠𝑠𝑋𝑋𝑋𝑋 = ∙ �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)
𝑛𝑛 − 1 𝑛𝑛 − 1
and by short-cut formulas which are often more convenient:
1 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − 𝑛𝑛 𝑋𝑋� 2 � and 𝑠𝑠𝑋𝑋𝑋𝑋 = ∙ �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛 𝑋𝑋� 𝑌𝑌��
𝑛𝑛 − 1 𝑛𝑛 − 1

Solution H.6.1
� 𝑋𝑋𝑖𝑖 = 50 � 𝑋𝑋𝑖𝑖2 = 250 � 𝑌𝑌𝑖𝑖 = 100 � 𝑌𝑌2𝑖𝑖 = 1100 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 500

a)
1 1 2 1 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − �� 𝑋𝑋𝑖𝑖 � � = �250 − 502 � = 5.952
𝑛𝑛 − 1 𝑛𝑛 14 15
1 1 1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − � 𝑋𝑋𝑖𝑖 � 𝑌𝑌𝑖𝑖 � = �500 − 50 ∙ 100� = 11.905
𝑛𝑛 − 1 𝑛𝑛 14 15
𝑠𝑠
𝛽𝛽̂1 = 𝑠𝑠𝑋𝑋𝑋𝑋
2 = 2.000 is the slope of the sample regression line
𝑋𝑋

1 50 1 100
𝑋𝑋� = � 𝑋𝑋𝑖𝑖 = = 3.333 and 𝑌𝑌� = � 𝑌𝑌𝑖𝑖 = = 6.667
𝑛𝑛 15 𝑛𝑛 15
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 0 is the intercept of the sample regression line
Sample regression line: 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 2𝑋𝑋
b) 𝑌𝑌� = 2 ∙ 2.9 = 5.8
c)
1 1 2 1 1
𝑠𝑠𝑌𝑌2 = ∙ �� 𝑌𝑌𝑖𝑖 2 − �� 𝑌𝑌𝑖𝑖 � � = �1100 − 1002 � = 30.952
𝑛𝑛 − 1 𝑛𝑛 14 15
2
𝑠𝑠𝑋𝑋𝑋𝑋 11.9052
𝑆𝑆𝑆𝑆𝑆𝑆 = (𝑛𝑛 − 1) �𝑠𝑠𝑌𝑌2 − � = 14 ∙ �30.952 − � = 100.0
𝑠𝑠𝑋𝑋2 5.952
𝑆𝑆𝑆𝑆𝑆𝑆 100.0
𝑠𝑠𝑢𝑢�2 = = = 7.692 ⟹ 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑢𝑢� = �𝑠𝑠𝑢𝑢�2 = 2.774
𝑛𝑛 − 2 13
The standard error of the regression (𝑺𝑺𝑺𝑺𝑺𝑺) is an estimator of the standard deviation of the
regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .
d)
1
𝑠𝑠𝑌𝑌2 = ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 ⟹ 𝑇𝑇𝑇𝑇𝑇𝑇 = 14 ∙ 30.952 = 433.328
𝑛𝑛−1

𝐸𝐸𝐸𝐸𝐸𝐸 𝑇𝑇𝑇𝑇𝑇𝑇 − 𝑆𝑆𝑆𝑆𝑆𝑆


𝑅𝑅 2 = = = 0.769
𝑆𝑆𝑆𝑆𝑆𝑆 𝑇𝑇𝑇𝑇𝑇𝑇
The regression 𝑹𝑹𝟐𝟐 is the fraction of the sample variance 𝑠𝑠𝑌𝑌2 explained by (or predicted by) 𝑋𝑋

24
e)
ANOVA Sum of Squares df Mean Square
Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = 333.328 1

Residuals 𝑆𝑆𝑆𝑆𝑆𝑆 = 100.0 13 𝑠𝑠𝑢𝑢�2 = 7.692


Total 𝑇𝑇𝑇𝑇𝑇𝑇 = 433.328 14 𝑠𝑠𝑌𝑌2 = 30.952

𝑅𝑅 2 𝑅𝑅 2 = 0.769

Solution H.6.2
� 𝑋𝑋𝑖𝑖 = 67 � 𝑋𝑋𝑖𝑖2 = 659 � 𝑌𝑌𝑖𝑖 = 5924 � 𝑌𝑌𝑖𝑖2 = 4 671 440 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 54 559

a) So
1 1
𝑠𝑠𝑋𝑋𝑋𝑋 𝑛𝑛−1 ∙ �∑ 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛 ∑ 𝑋𝑋𝑖𝑖 ∑ 𝑌𝑌𝑖𝑖 � 54 559 − 18 ∙ 67 ∙ 5924 4945.5
𝛽𝛽̂1 = 2 = 1 = = = 50.529
𝑠𝑠𝑋𝑋 1
𝑛𝑛−1
∙�∑ 𝑋𝑋𝑖𝑖2 − (∑ 𝑋𝑋𝑖𝑖 )2 �
𝑛𝑛
659 − 18 ∙ 672 97.875
1 1 1 1
𝛽𝛽̂0 = 𝑛𝑛 � 𝑦𝑦 − 𝑏𝑏1 ∙ 𝑛𝑛 � 𝑋𝑋𝑖𝑖 = 8 ∙ 5924 − 50.529 ∙ 8 ∙ 67 = 317.32

Regression line: 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 317.32 − 50.529 ∙ 𝑋𝑋 where 𝑌𝑌 ∶ #customers and 𝑋𝑋 ∶ #ads
The estimated intercept 𝛽𝛽̂0 is 3017.32 customers. The estimated slope 𝛽𝛽̂1 is 50.529 customers per
ad.
b)
1 1 2 1 1
𝑠𝑠𝑌𝑌2 = ∙ �� 𝑌𝑌𝑖𝑖2 − �� 𝑌𝑌𝑖𝑖 � � = �4 671 440 − 59242 � = 40 674
𝑛𝑛 − 1 𝑛𝑛 7 8
1 1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − � 𝑋𝑋𝑖𝑖 � 𝑌𝑌𝑖𝑖 � = ∙ 4945.5 = 706.5
𝑛𝑛 − 1 𝑛𝑛 7
1 1 2 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − �� 𝑋𝑋𝑖𝑖 � � = ∙ 97.875 = 13.982
𝑛𝑛 − 1 𝑛𝑛 7
2
𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛 − 1 2 𝑠𝑠𝑋𝑋𝑋𝑋 7 706.52
𝑠𝑠𝑢𝑢�2 = = �𝑠𝑠𝑌𝑌 − 2 � = �40 674 − � = 5804.26 ⟹ 𝑠𝑠𝑢𝑢� = �𝑠𝑠𝑢𝑢�2 = 76.19
𝑛𝑛 − 2 𝑛𝑛 − 2 𝑠𝑠𝑋𝑋 6 97.875
The standard error of the regression (𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑢𝑢� ) is an estimator of the standard deviation of the
regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .
c)
𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 (𝑛𝑛 − 2) ∙ 𝑠𝑠𝑢𝑢2� 6 ∙ 5804.26
𝑅𝑅 2 = =1− =1− 2 =1− = 0.878
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 (𝑛𝑛 − 1) ∙ 𝑠𝑠𝑦𝑦 7 ∙ 40 674
The regression 𝑅𝑅 2 is the fraction of the sample variance 𝑠𝑠𝑦𝑦2 explained by (or predicted by) 𝑋𝑋
d)

ANOVA Sum of Squares df Mean Square


Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = 249 892.5 1

Residuals 𝑆𝑆𝑆𝑆𝑆𝑆 = 34 825.5 6 𝑠𝑠𝑢𝑢�2 = 5804.26


Total 𝑇𝑇𝑇𝑇𝑇𝑇 = 284 718 7 𝑠𝑠𝑌𝑌2 = 40 674

𝑅𝑅 2 𝑅𝑅 2 = 0.878

25
Solution H.6.3
EXERCISE = weekly exercises in minutes
BEFORE = cholesterol level before exercise program
AFTER = cholesterol level after exercise program
The reduction in cholesterol is calculated as: RED = BEFORE − AFTER
a)
𝑠𝑠𝑋𝑋𝑋𝑋 1240.600 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) 60 789.400
𝛽𝛽̂1 = ̂
2 = 13641.307 = 0.0909444 or 𝛽𝛽1 = ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�) 2
=
668 424.020
= 0.0909444
𝑠𝑠𝑋𝑋
For each additional minute of exercise, cholesterol is reduced, on average by 0.09094.
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 27.8000 − 0.0909444 ∙ 283.1400 = 2.050
� = 2.050 + 0.09094 ∙ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
𝑅𝑅𝑅𝑅𝑅𝑅
b)

ANOVA Sum of Squares df Mean Square


Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = 5 533.5 1

Residuals 𝑆𝑆𝑆𝑆𝑆𝑆 = 5 316.5 48 𝑠𝑠𝑢𝑢�2 = 110.760


Total 𝑇𝑇𝑇𝑇𝑇𝑇 = 10 850 49 𝑠𝑠𝑌𝑌2 = 221.429

𝑅𝑅 2 𝑅𝑅 2 = 0.510
c) Hence
2
𝑅𝑅 2 = 𝑟𝑟𝑋𝑋𝑋𝑋 = 0.7142 = 0.510
51% is the fraction of the variation of the variable RED that is explained by the variation of the variable
EXERCISE.

𝑆𝑆𝑆𝑆𝑆𝑆 5316.5
𝑠𝑠𝑢𝑢� = � =� = 10.5293
𝑛𝑛 − 2 48

𝑠𝑠𝑢𝑢� is an estimator of the standard deviation of the regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .

Solution H.6.4
The sample size 𝑛𝑛 = 100. The estimated regression equation is
� = −79.24 + 4.16 ∙ Height
Weıght
𝑅𝑅 2 = 0.72, 𝑆𝑆𝑆𝑆𝑆𝑆 = 12.6
a. Substituting Height = 64, 68, and 72 inches into the equation, the predicted weights are
−79.24 + 4.16 ∙ 64 = 187 pounds, −79.24 + 4.16 ∙ 68 = 203.64 pounds and
−79.24 + 4.16 ∙ 72 = 220.28 pounds
� = 4.16 ∙ ∆Height = 4.16 ∙ 2 = 8.32 pounds
b. ∆Weıght
c. First, we rewrite the original estimated regression equation as
𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋,
and express the estimated regression equation in the centimeter-kilogram space as
𝑌𝑌�𝑘𝑘𝑘𝑘 = 𝛾𝛾�0 + 𝛾𝛾�1 𝑋𝑋𝑐𝑐𝑐𝑐

26
Using the conversion of measurements 𝑌𝑌𝑘𝑘𝑘𝑘 = 0.4536 ∙ 𝑌𝑌 and 𝑋𝑋𝑐𝑐𝑐𝑐 = 2.54 ∙ 𝑋𝑋.in the LS formulas for the
slope and intercept gives

𝑆𝑆𝑋𝑋𝑐𝑐𝑐𝑐 𝑌𝑌𝑘𝑘𝑘𝑘 2.54∙0.4536∙𝑆𝑆𝑋𝑋𝑋𝑋 0.4536∙𝑆𝑆𝑋𝑋𝑋𝑋 0.4536


𝛾𝛾�1 = 𝑆𝑆 2𝑋𝑋𝑐𝑐𝑐𝑐
= 2
2.542 𝑆𝑆𝑋𝑋
= 2.54∙𝑆𝑆𝑌𝑌2
=� �∙ 𝛽𝛽̂1 and
2.54
0.4536
𝛾𝛾�0 = ����� 𝑋𝑋𝑐𝑐𝑐𝑐 = 0.4563 ∙ 𝑌𝑌� − �
𝑌𝑌𝑘𝑘𝑘𝑘 − 𝛾𝛾�1 ����� � ∙ 𝛽𝛽̂1 ∙ 2.54 ∙ 𝑋𝑋� = 0.4563 ∙ �𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋�� = 0.4536 ∙ 𝛽𝛽̂0
2.54

The coefficient of determination is unit free, i.e.


2

2
⎛ 𝑠𝑠𝑋𝑋𝑐𝑐𝑐𝑐 𝑌𝑌𝑘𝑘𝑘𝑘 ⎞ 2.54 ∙ 0.4536 ∙ 𝑠𝑠𝑋𝑋𝑋𝑋
2
𝑅𝑅𝑐𝑐𝑐𝑐;𝑘𝑘𝑘𝑘 = 𝑟𝑟𝑋𝑋2𝑐𝑐𝑐𝑐𝑌𝑌𝑘𝑘𝑘𝑘 = ⎜ ⎟ =�
2
� = 𝑟𝑟𝑋𝑋𝑋𝑋 = 𝑅𝑅 2
�𝑠𝑠 2𝑋𝑋𝑐𝑐𝑐𝑐 ∙ �𝑠𝑠 2𝑌𝑌𝑘𝑘𝑘𝑘 √2.542 ∙ �𝑠𝑠𝑋𝑋2 ∙ √0.45362 ∙ �𝑠𝑠𝑌𝑌2
⎝ ⎠
and therefore remains at 0.72.
The residuals can be expressed as
0.4536
𝑌𝑌𝑘𝑘𝑘𝑘 − 𝛾𝛾�0 − 𝛾𝛾�1 𝑋𝑋𝑐𝑐𝑐𝑐 = 0.4536 ∙ 𝑌𝑌 − 0.4536 ∙ 𝛽𝛽̂0 − � � ∙ 𝛽𝛽̂1 ∙ 2.54 ∙ 𝑋𝑋 = 0.4536 ∙ �𝑌𝑌 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋�
2.54
and therefore
𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐,𝑘𝑘𝑘𝑘 = �0.45362 ∙ 𝑆𝑆𝑆𝑆𝑆𝑆 = 0.4536 ∙ 12.6 = 5.71536.

Solution H.6.5
a) The coefficient 8.8 shows the marginal effect of 𝐴𝐴𝐴𝐴𝐴𝐴 on the 𝐴𝐴𝐴𝐴𝐴𝐴; that is, 𝐴𝐴𝐴𝐴𝐴𝐴 is expected to increase
by €8.8 for each additional expense.
The intercept of the regression line is 710.7. It determines the overall level of the line. The interpreta-
tion is that a worker with €0 average monthly income has predicted average monthly expenditures of
€710.7. This interpretation is not sensible, however, since a monthly income of €0 is not realistic. More-
over, prediction is not safe for a given price outside the range of observed monthly income (€100 to
€1.5 million), and a a monthly income of €0 is outside this range.
b) 𝑆𝑆𝑆𝑆𝑆𝑆 is in the same units as the dependent variable (𝑌𝑌, or 𝐴𝐴𝐴𝐴𝐴𝐴 in this example). Thus, 𝑆𝑆𝑆𝑆𝑆𝑆 is measured
in euros per month.
c) 𝑅𝑅 2 is unit free
d) Substituting 𝐴𝐴𝐴𝐴𝐴𝐴 = 100 and 𝐴𝐴𝐴𝐴𝐴𝐴 = 200 into the equation, the predicted monthly expenditure are
710.7 + 8.8 ∙ 100 = € 1590.7 and 710.7 + 8.8 ∙ 200 = € 2470.7, respectively.
e) No. The highest income in the sample is €1.5 million, so €2 million is far outside the range of the sample
data.
f) No. The distribution of earning is positively skewed and has kurtosis larger than the normal.

Solution H.6.6
a) The error term 𝑢𝑢𝑖𝑖 represents factors other than time that influence the participant’s performance on
the test including inherent cognitive ability and aptitude. Some participants may have better memories
and some might have weaker ones
b) Because of random assignment, 𝑢𝑢𝑖𝑖 is independent of 𝑋𝑋𝑖𝑖 . Since 𝑢𝑢𝑖𝑖 represents deviations from average
𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0. . Because 𝑢𝑢𝑖𝑖 is independent of 𝑋𝑋𝑖𝑖 . for all 𝑖𝑖 = 1, … , 𝑛𝑛, we have 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0.
This means that the estimated coefficients will be unbiased.

27
c) The estimated gain in score equals
� = 0.17 ∙ ∆X = 0.17 ∙ 5 = 0.85.
∆Y

Solution H.6.7
When you do OLS with the data of 𝐷𝐷𝑖𝑖 and 𝐹𝐹𝑖𝑖 , you get the following.
𝑠𝑠𝐹𝐹𝐹𝐹 𝑠𝑠32+1.8𝐶𝐶,1.25𝐸𝐸 1.8 ∙ 1.25 ∙ 𝑠𝑠𝐶𝐶𝐶𝐶 1.25 1.25
𝛾𝛾�1 = ̂1 =
2 = 2 = 2 = ∙ 𝛽𝛽 ∙ 2.16 = 1.5
𝑠𝑠𝐹𝐹 𝑠𝑠32+1.8𝐶𝐶 1. 82 ∙ 𝑠𝑠𝐶𝐶 1.8 1.8
𝛾𝛾�0 = 𝐷𝐷̄ − 𝛾𝛾�1 𝐹𝐹̄ = 1.25𝐸𝐸̄ − 1.5(32 + 1.8𝐶𝐶̄ ) = 1.25𝐸𝐸̄ − 48 − 2.7𝐶𝐶̄ = 1.25(𝐸𝐸̄ − 2.16𝐶𝐶̄ ) − 48
= 1.25𝛽𝛽̂0 − 48 = 1.25(−4.32) − 48 = −53.4

Solution H.6.8
• 𝐸𝐸(𝑋𝑋) = 5 + 2 ∙ 0 = 5 and 𝐸𝐸(𝑈𝑈) = 0 , so 𝐸𝐸(𝑌𝑌) = 60 − 3 ∙ 5 + 0 = 45
• As 𝑋𝑋 and 𝑈𝑈 are independent, we have 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = (−3)2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈) = 9 ∙ 22 + 72 = 85
• 𝐸𝐸(𝑌𝑌|𝑋𝑋) = 60 − 3𝑋𝑋 + 𝐸𝐸(𝑈𝑈|𝑋𝑋) = 60 − 3𝑋𝑋 + 𝐸𝐸(𝑈𝑈) = 60 − 3𝑋𝑋
• 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌|𝑋𝑋) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈|𝑋𝑋) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈) = 72 ∙ 1 = 49

Solution H.6.9
a) With the law of iterated expectations we have 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 𝐸𝐸�𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸(0) = 0.
Using this and using the law of iterated expectations once more we get:

𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑢𝑢𝑖𝑖 ) = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 ) − 𝐸𝐸(𝑋𝑋𝑖𝑖 ) 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 ) = 𝐸𝐸�𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸�𝑋𝑋𝑖𝑖 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸(0) = 0
𝑠𝑠𝑋𝑋𝑋𝑋 1 𝑠𝑠2
b) Applying LS to the second equation leads to 𝛾𝛾�1 = 2
𝑠𝑠𝑌𝑌
so �1
= 𝑠𝑠 𝑌𝑌 .
𝛾𝛾 𝑋𝑋𝑋𝑋
1 2
𝑠𝑠𝑌𝑌 𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) 𝜎𝜎𝑌𝑌2
Now �1
= �⎯� = . Note that due to the second LS assumption that (𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) are i.i.d for
𝛾𝛾 𝑠𝑠𝑋𝑋𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 .𝑌𝑌𝑖𝑖 ) 𝜎𝜎𝑋𝑋𝑋𝑋
𝑖𝑖 = 1, … , 𝑛𝑛 ., we know that 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) are constant.

In addition it follows that 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 is also i.i.d. with constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ).
The first LS assumption says 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 so that 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 0 (see part a.), so that
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) = 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) and
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) = 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) + 2𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 )
Therefore we get:
1 𝑠𝑠𝑦𝑦2 𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 )
= → = = 𝛽𝛽1 + ≠ 𝛽𝛽1
𝛾𝛾�1 𝑠𝑠𝑥𝑥𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 )

Solution H.6.10
a)
1 1 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
𝐸𝐸(𝑏𝑏1 ) = 𝐸𝐸�𝛽𝛽̂1 � + 𝐸𝐸 � � = 𝛽𝛽1 + and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑏𝑏1 ) = 𝑣𝑣𝑣𝑣𝑣𝑣�𝛽𝛽̂1 � =
𝑛𝑛 𝑛𝑛 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣( 𝑋𝑋𝑖𝑖 )]2
b) Clearly, 𝑏𝑏1 is not unbiased as
1
𝐸𝐸(𝑏𝑏1 ) = 𝛽𝛽1 + ≠ 𝛽𝛽1
𝑛𝑛
1 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 −𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
However, 𝑏𝑏1 is consistent, since 𝐸𝐸(𝑏𝑏1 ) = 𝛽𝛽1 + 𝑛𝑛 → 𝛽𝛽1 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑏𝑏1 ) = 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋 )]2
→ 0 when
𝑖𝑖
𝑛𝑛 → ∞

28
Week 6
Tutorial exercises
Question T.6.1 (adapted from Exam 28-3-2008)
A company produces external disk drives. The management wants to design a regression model with
numbers of disk drives sold as the dependent variable, and its prices as explanatory variable. (Also, the
prices of competitors are recorded).
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 Number
observation Price
price of competitors sold
1 120 100 102
2 140 110 100
3 190 90 120
4 130 150 77
5 155 210 46
6 175 150 93
7 125 250 26
8 145 270 69
9 180 300 65
10 150 250 85

Correlations
PCOMP Price Number
PCOMP Pearson Correlation 1 ….. .318
Sig. (2-tailed) ….. .371
Sum of Squares and Cross-products 5190.000 820.000 1927.000
Covariance 576.667 91.111 214.111
N 10 10 10
Price Pearson Correlation ….. 1 -.726
Sig. (2-tailed) ….. .017
Sum of Squares and Cross-products 820.000 53760.000 -14164.000
Covariance 91.111 5973.333 -1573.778
N 10 10 10
Number Pearson Correlation .318 -.726 1
Sig. (2-tailed) .371 .017
Sum of Squares and Cross-products 1927.000 -14164.000 7076.100
Covariance 214.111 -1573.778 786.233
N 10 10 10

MODEL 1
Model Summaryb

Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson

1 .468 1.855
a. Predictors: (Constant), Price b. Dependent Variable: Number

29
ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression

Residual

Total
a. Predictors: (Constant), Price
b. Dependent Variable: Number

a. What are the estimates for the constant and slope of the regression line in the simple regression
model, with the variable ‘Price’ as the explanatory variable?
b. What is the point prediction for the number of sold drives when the price equals 130?
c. Enter the missing values in the ANOVA table for MODEL 1.
d. Give the interpretation of 𝛽𝛽̂1 , 𝛽𝛽̂0 , 𝑌𝑌�, 𝑆𝑆𝑆𝑆𝑆𝑆, and 𝑅𝑅 2.

Question T.6.2 (resit 4-1-2016, modified)


Consider a simple linear regression model:
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋i + 𝑢𝑢𝑖𝑖 (𝑖𝑖 = 1, . . . , 𝑛𝑛)
� and 𝛽𝛽� .
OLS is applied, which gives the LS -estimators 𝛽𝛽 0 1
� 𝑖𝑖 for the fitted or predicted value and 𝑢𝑢� 𝑖𝑖 for the residual.
Write 𝑌𝑌
a. Define 𝑌𝑌�𝑖𝑖 and 𝑢𝑢�𝑖𝑖 and also the sum of squared residuals 𝑆𝑆𝑆𝑆𝑆𝑆.
b. Show that ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢
� 𝑖𝑖 = 0 and ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 𝑢𝑢� 𝑖𝑖 = 0 using the first-order conditions.
1 2 𝑆𝑆𝑆𝑆𝑆𝑆
c. Show that ∑𝑛𝑛 �𝑢𝑢� − 𝑢𝑢�� � = and also that
𝑛𝑛−1 𝑖𝑖=1 𝑖𝑖 𝑛𝑛−1

− 𝑌𝑌� � �𝑢𝑢�𝑖𝑖 − 𝑢𝑢�� � = 0.


1
𝑠𝑠𝑌𝑌�𝑢𝑢� = ∑𝑛𝑛 �𝑌𝑌�
𝑛𝑛−1 𝑖𝑖=1 𝑖𝑖

Can you give an explanation of the last result?


d. Use the result of part c. to show that
� 2
∑𝑛𝑛𝑖𝑖=1(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)2 ∑𝑖𝑖=1�𝑌𝑌�𝑖𝑖 − 𝑌𝑌�
𝑛𝑛 2
∑𝑛𝑛𝑖𝑖=1�𝑢𝑢�𝑖𝑖 − 𝑢𝑢�� �
= +
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1
(This means that we have shown that 𝑇𝑇𝑇𝑇𝑇𝑇 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑆𝑆𝑆𝑆𝑆𝑆)

30
Exercise T.6.3 (midterm 28-9-2017, Q2ab modified)
In practice the researcher does not know the exact data generating process. Here you will consider a case
where you do know the exact data generating process, so that you can use this knowledge to evaluate the
quality of the least squares (LS) estimators.
Assume that 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 for 𝑖𝑖 = 1, … , 𝑛𝑛 where 𝛽𝛽0 and 𝛽𝛽1 are unknown parameters and each
pair (𝑋𝑋𝑖𝑖 , 𝑢𝑢𝑖𝑖 ) is drawn randomly from the following simultaneous distribution:
𝑢𝑢𝑖𝑖 = −2 𝑢𝑢𝑖𝑖 = −1 𝑢𝑢𝑖𝑖 = 0 𝑢𝑢𝑖𝑖 = 1 𝑢𝑢𝑖𝑖 = 2 total
𝑋𝑋𝑖𝑖 = 0 0.02 0.08 0.22 0.04 0.06 0.42
𝑋𝑋𝑖𝑖 = 1 0.08 0.10 0.24 0.10 0.06 0.58
total 0.10 0.18 0.46 0.14 0.12 1.00
a. Give the LS assumptions (Key Concept 4.3) of this model and check the validity of each assumption.
b. Suppose data of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are obtained for 𝑖𝑖 = 1, … , 𝑛𝑛. Is the LS-estimator of 𝛽𝛽1 unbiased?

Question T.6.4 (S&W 4.8)


Suppose all of the regression assumptions in Key Concept 4.3 are satisfied, except that the first assumption
is replaced with 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 2. Which parts of Key Concept 4.4 continue to hold? Which change? Why?
(Is 𝛽𝛽̂1 normally distributed in large samples with mean and variance given in Key Concept 4.4? What about
𝛽𝛽̂0 ?)

31
Week 7
Homework exercises.
Exercise H.7.1 (continuation from H.6.1)
Referring to Exercise H.6.1 on page 19 and its Solution H.6.1 on page 24.
Test if there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌. Use 𝛼𝛼 = 0.10 and the
assumption of homoskedasticity.
Give (1) hypotheses, (2) test statistic and its distribution, (3) conditions, (4) rejection region, (5) outcome,
(6) confrontation and decision, (7) conclusion.

Exercise H.7.2 (continuation from H.6.2)


Referring to Exercise H.6.2 on page 19 and its Solution H.6.2 on page 25.
Is there enough evidence to indicate that the number of ads has a positive effect on the number of
customers? Use 𝛼𝛼 = 0.01 and the assumption of homoskedasticity.
Give (1) hypotheses, (2) test statistic and its distribution, (3) conditions, (4) rejection region, (5) outcome,
(6) confrontation and decision, (7) conclusion, (8) 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣.

Exercise H.7.3 (continuation from H.6.3)


Referring to Exercise H.6.3 on page 19-20 and its Solution H.6.4 on page 26.
a) Conduct a complete test, based on the coefficient 𝛽𝛽1 , to see whether there is sufficient evidence
to conclude that 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 has an effect on RED. Use 𝛼𝛼 = 0.05 and the assumption of
homoskedasticity.
Give (1) hypotheses, (2) test statistic and its distribution, (3) conditions, (4) rejection region, (5)
outcome, (6) confrontation and decision, (7) conclusion, (8) 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣.
b) Determine the 99% confidence interval for 𝛽𝛽1 .
c) What is the interpretation of this confidence interval?

Question H.7.4 (midterm 24-9-2018, modified )


In a simulation with STATA the following output is obtained:
set obs 500
number of observations (_N) was 0, now 500
set seed 123
generate x=5+2*rnormal()
generate y=4+3*x+100*rnormal()
regress y x
Source | SS df MS Number of obs = 500
-------------+---------------------------------- F(1, 498) = 0.64
Model | 5999.99514 1 5999.99514 Prob > F = 0.4248
Residual | 4683262.94 498 9404.14245 R-squared = 0.0013
-------------+---------------------------------- Adj R-squared = -0.0007
Total | 4689262.93 499 9397.32051 Root MSE = 96.975

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.778623 2.226733 0.80 0.425 -2.596326 6.153572
_cons | 15.27447 11.90309 1.28 0.200 -8.111984 38.66093
------------------------------------------------------------------------------

a. Given the obtained regression, what is the error in the estimation of the slope?

32
b. Use the relevant confidence interval in the STATA output to test whether 𝐻𝐻0 ∶ 𝛽𝛽1 = 7 is rejected
in favour of 𝐻𝐻1 ∶ 𝛽𝛽1 < 7 (usual notation). Which significance level do you use?
c. When a new sample with the same number of observations is generated, what do you know about
the distribution of the resulting slope estimator?

Exercise H.7.5 (midterm 26-9-2013 Q2, modified)


Consider the simple regression model 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 (𝑖𝑖 = 1, … , 𝑛𝑛)
a. Specify exactly under which conditions it is appropriate (according to Stock and Watson) to apply LS
with robust standard errors in order to estimate and test the model. Clarify the quality of the LS
estimators (unbiased, efficient, consistent?) and the (approximate) distribution of the 𝑡𝑡-statistics.
b. Specify exactly under which conditions it is appropriate (according to Stock and Watson) to apply LS
with homoscedasticity-only standard errors in order to estimate and test the model when the
sample size 𝑛𝑛 is small. Clarify the quality of the LS estimators (unbiased, efficient, consistent?) and
the distribution of the 𝑡𝑡-statistics.
c. Formulate the Gauss-Markov theorem.

Exercise H.7.6 (S&W 5.8)


Suppose (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ) satisfy the least squares assumptions in S&W Key Concept 4.3 and, in addition,
𝑢𝑢𝑖𝑖 ~ 𝑁𝑁(0, 𝜎𝜎𝑢𝑢2 ) and 𝑢𝑢𝑖𝑖 is independent of 𝑋𝑋𝑖𝑖 . A sample of size 𝑛𝑛 = 30 yields
𝑌𝑌� = 43.2 + 61.5𝑋𝑋, 𝑅𝑅 2 = 0.54, 𝑆𝑆𝑆𝑆𝑆𝑆 = 1.52
(10.2) (7.4) .
where the numbers in parentheses are the homoskedastic-only standard errors for the regression
coefficients.
a. Construct a 95% confidence interval for 𝛽𝛽0 .
b. Test 𝐻𝐻0 ∶ 𝛽𝛽1 = 55 vs 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 55 at the 5% level.
c. Test 𝐻𝐻0 ∶ 𝛽𝛽1 = 55 vs 𝐻𝐻1 ∶ 𝛽𝛽1 > 55 at the 5% level.

Exercise H.7.7 (midterm 24-9-2015 Q3, modified)


Are the following claims true or false? Carefully explain your answers. (Only correct answers with correct
justification will be rewarded.)
a. Ignoring the nonconstant variance of the error term will lead to biased estimators of the
coefficients in the simple regression model.
b. Ignoring the nonconstant variance of the error term will lead to invalid estimators of the
confidence intervals for the coefficients in the simple regression model.
c. The 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣, reported by STATA, of the coefficient of the explanatory variable in a simple
regression is found to be equal to 0.018. This means that 𝐻𝐻0 ∶ 𝛽𝛽 = 0 is rejected in favour of 𝐻𝐻1 ∶
𝛽𝛽 ≠ 0 at the 1% level.

Exercise H.7.8 (resit 10-1-2013 Q2)


a. Derive the least squares estimator of the model 𝑦𝑦𝑖𝑖 = 𝛼𝛼 + 𝑢𝑢𝑖𝑖 where 𝛼𝛼 is a constant and 𝑢𝑢𝑖𝑖 is the
error term.
b. Derive the variance of the OLS-estimator you derived in a. under the usual assumptions with
respect to OLS. Clearly specify when you make use of the assumptions. Are you deriving a
population or sample variance?

33
Exercise H.7.9 (S&W 5.6)
In the 1980s, Tennessee conducted an experiment in which a large sample of kindergarten students were
randomly assigned to “regular” and “small” classes and given standardized tests at the end of the year.
(Regular classes contained approximately 24 students, and small classes contained approximately 15
students.) Suppose, in the population, the standardized tests have a mean score of 925 points and a
standard deviation of 75 points. Let SmallClass denote a binary variable equal to 1 if the student is
assigned to a small class and equal to 0 otherwise. A regression of TestScore on SmallClass yields
TestScore = 918.0 + 13.9 ∙ SmallClass, 𝑅𝑅 2 = 0.01, 𝑆𝑆𝑆𝑆𝑆𝑆 = 74.6
(1.6) (2.5) .
(between parentheses: heteroscedastic-robust standard errors)
a. Do you think that the regression errors are plausibly homoskedastic? Explain.
We can construct a 99% confidence interval for the effect of SmallClass on TestScore using
formula (5.3):
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = �𝜎𝜎�𝛽𝛽�2
1

b. Now, suppose the regression errors were homoskedastic. Would this affect the validity of the
confidence interval using the heteroscedastic-robust standard errors? Explain.

34
Week 7
Homework solutions
Solution H.7.1
From Solution H.6.1 on page 24: 𝑠𝑠𝑋𝑋2 = 5.952, 𝛽𝛽̂1 = 2.000, 𝑠𝑠𝑢𝑢� = 2.774
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• two extended least squares assumptions (‘Classical’ inference):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁 (𝑛𝑛 = 15, so the sample size is small)

Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 13] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝛼𝛼 = 0.10 two-tailed ⟹ 𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.05, 13 = 1.771 or 𝑇𝑇 ≤ −𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = −1.771
Sample outcome
𝛽𝛽̂1 − 𝛽𝛽1,0 2.000 − 0
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = = = 6.58
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � 2.774
�(15 1) ∙ 5.952

Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given a significance level of 10%, there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌.

Solution H.7.2
From Solution H.6.2 on page on page 25: 𝑠𝑠𝑋𝑋2 = 13.982, 𝛽𝛽̂1 = 50.529, 𝑠𝑠𝑢𝑢� = 76.19
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• two extended least squares assumptions (‘Classical’ inference):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁 (𝑛𝑛 = 8, so the sample size is small)

35
Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 > 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 6] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝛼𝛼 = 0.01 one-tailed ⟹ 𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.01, 6 = 3.143
Sample outcome
� − 𝛽𝛽
𝛽𝛽 50.529 − 0
1 1,0
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = = = 6.56
� �
𝑆𝑆𝑆𝑆�𝛽𝛽 76.19
1
�(8 1) ∙ 13.982

Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given the significance level of 1%, there is enough evidence to indicate that the number of ads has
a positive effect on the number of customers
p-value
Table, df = 6 : p-value = 𝑃𝑃(𝑇𝑇 ≥ 6.561) < 𝑃𝑃(𝑇𝑇 ≥ 3.707) = 0.005

Solution H.7.3
From Solution H.6.3 on page 26: 𝑠𝑠𝑋𝑋2 = 13641.307, 𝛽𝛽̂1 = 0.09094, 𝑠𝑠𝑢𝑢� = 10.5293.
a)
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• extended least squares assumption (𝑛𝑛 = 50, so the sample size appears to be sufficiently large):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 48 ≈ 50] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
̂
𝑆𝑆𝑆𝑆�𝛽𝛽1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.025, 50 = 2.009 or 𝑇𝑇 ≤ −𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = −2.009
Sample outcome
𝛽𝛽̂1 − 𝛽𝛽1,0 0.09094 − 0
𝑇𝑇 = = = 7.061
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � 10.5293
�(50 − 1) ∙ 13641.307

36
Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given a significance level of 5%, there is sufficient evidence to conclude that EXERCISE has an
effect on RED
p-value
p-value = 2 ∙ 𝑃𝑃(𝑇𝑇 ≥ 𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = 7.062) < 2 ∙ 𝑃𝑃(𝑇𝑇 ≥ 2.678) = 2 ∙ 0.005 = 0.01
b)
𝑠𝑠𝑢𝑢� 10.5293
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = = = 0.01288
�(𝑛𝑛 − 1)𝑠𝑠𝑥𝑥2 �(50 − 1) ∙ 13641.307
The 99% confidence interval for 𝛽𝛽1 :
�𝛽𝛽̂1 − 𝑡𝑡0.005,50 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 �, 𝛽𝛽̂1 + 𝑡𝑡0.005,50 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �
= �0.09094 − 2.678 ∙ 0.01288, 0.09094 + 2.678 ∙ 0.01288�
= [0.056, 0.125]
c) The true value of 𝛽𝛽1 lies between 0.056 and 0.125 in 99% of all possible samples.
(Or: with 99% confidence, the true value of 𝛽𝛽1 lies between 0.056 and 0.125.

Solution H.7.4
a. The following line of STATA code: generate y=4+3*x+100*rnormal() implies that 𝛽𝛽1 = 3.
Therefore, the estimation error equals
𝛽𝛽̂1 − 𝛽𝛽1 = 1.778623 − 3 = −1.221377
b. The 95% confidence interval for 𝛽𝛽1 in the STATA output is [−2.596326, 6.153572 ], which means
that it lies completely below the test value of 7. Hence, there is sufficient statistical evidence to infer
that 𝛽𝛽1 is smaller than 7.
Because the two-sided confidence interval is used to test a one-sided alternative hypothesis, there is
only a one-directional risk, and thus we are testing with a significance level of 5%/2 = 2.5%.
c. In the simulation all Gauss-Markov assumptions are satisfied (Key Concept 5.5) and 𝑢𝑢𝑖𝑖 ~ 𝑁𝑁(0, 1002 ).
Hence, 𝛽𝛽̂1 ~ 𝑁𝑁(3, 𝜎𝜎𝛽𝛽�2 ) with (homoskedasticity only)
1

𝜎𝜎𝑢𝑢2 1002 100


𝜎𝜎𝛽𝛽�2 = 2 = 2
= = 5.
1 𝑛𝑛 ∙ 𝜎𝜎𝑋𝑋 500 ∙ 2 20

Solution H.7.5
a. Under the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and assuming that the sample size 𝑛𝑛 is large (‘Modern’ inference).
Under these conditions, the LS estimators are unbiased and consistent, and the 𝑡𝑡-statistics have
(approximately) a standard normal distribution. There is no result on efficiency.
b. Under the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛

37
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and two extended LS assumptions:
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁
When these 5 LS assumptions hold, the LS estimators have an exact normal sampling distribution, and
the homoskedasticity-only 𝑡𝑡-statistics have an exact Student 𝑡𝑡 distribution. Furthermore, the first 4
assumptions imply that LS estimators are Best Linear conditionally Unbiased Estimators (BLUE), so also
efficient.
c. If the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and one extended LS assumption:
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
hold, the LS estimators are Best Linear conditionally Unbiased Estimators (BLUE). In other words, the
Gauss Markov theorem states that the LS estimators have the smallest conditional variance of all linear
conditionally unbiased estimators.

Solution H.7.6
a. 𝛽𝛽̂0 ± 𝑡𝑡0.025, 28 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂0 � = 43.2 ± 2.048 ∙ 10.2 = 43.2 ± 20.89 gives (22.31, 64.09).
�1 −𝛽𝛽1,0
𝛽𝛽 61.5−55
b. The t-statistic is 𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 = �1 �
𝑆𝑆𝑆𝑆�𝛽𝛽
= = 0.878 which is less (in absolute value) than the critical
7.4
value of 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.025, 28 = 2.048. Thus, the null hypothesis is not rejected at the 5% level.
c. The one sided 5% critical value is 𝑡𝑡0.05, 28 = 1.701.
𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 is less than this critical value, so that the null hypothesis is not rejected at the 5% level.

Solution H.7.7
a. False. The unbiasedness of the OLS estimator has nothing to do with heteroscedasticity. In fact, to show
unbiasedness we only needed the LS Assumptions listed in Key Concept 4.3.
b. True. If one uses homoscedasticity-only standard errors, then ignoring the nonconstant variance of the
error term leads to incorrect standard errors for the OLS estimators. The use of robust standard errors
acknowledges the presence of heteroscedasticity.
c. False. Note that the 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 is the smallest significance level for which one could reject the null
assuming the null hypothesis is true. The 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 is larger than the significance level so we could not
reject at the 1% level.

Solution H.7.8
a. 𝑚𝑚𝑚𝑚𝑚𝑚{𝛼𝛼�} ∑𝑛𝑛𝑖𝑖=1(𝑌𝑌𝑖𝑖 − 𝛼𝛼�)2 so −2 ∑𝑛𝑛𝑖𝑖=1(𝑌𝑌𝑖𝑖 − 𝛼𝛼�) = 0 ⟹ 𝛼𝛼� = 𝑌𝑌�
b. Of course, we consider the population variance:
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌̄) = 𝑣𝑣𝑣𝑣𝑣𝑣 � � 𝑌𝑌𝑖𝑖 � = 2 𝑣𝑣𝑣𝑣𝑣𝑣 �� 𝑌𝑌𝑖𝑖 � = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 )
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Now we make use of the assumption that 𝑌𝑌𝑖𝑖 is i.i.d. (LS-ass. # 2) and that 𝑌𝑌𝑖𝑖 = 𝛼𝛼 + 𝑢𝑢𝑖𝑖 , then
𝑛𝑛 𝑛𝑛
1 1
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼 + 𝑢𝑢𝑖𝑖 ) = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) since 𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼) = 0
𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1

38
As we assumed that 𝑌𝑌𝑖𝑖 is identically distributed, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) is constant, say 𝜎𝜎 2 . (note: as
there is no independent variable 𝑋𝑋𝑖𝑖 we do not work with the conditional variance 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) ) .
Hence,
𝑛𝑛
1 𝜎𝜎 2
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 2 � 𝜎𝜎 2 =
𝑛𝑛 𝑛𝑛
𝑖𝑖=1
which is the population variance

Solution H.7.9
a. The question asks whether the variability in test scores in large classes is the same as the variability in
small classes. It is hard to say. On the one hand, teachers in small classes might able so spend more
time bringing all of the students along, reducing the poor performance of particularly unprepared
students. On the other hand, most of the variability in test scores might be beyond the control of the
teacher.
b. Formula (5.3) is valid for heteroskesdasticity or homoskedasticity; thus inferences are valid in either
case.

39
Week 7
Tutorial exercises
Question T.7.1 (S&W 5.2)
Suppose that a researcher, using wage data on 200 randomly selected male workers and 240 female
workers, estimates the OLS regression
� = 10.73 + 1.78 ∙ Male, 𝑅𝑅 2 = 0.09, 𝑆𝑆𝑆𝑆𝑆𝑆 = 3.8
Wage
(0.16) (0.29) .
(between parentheses: heteroscedastic-robust standard errors)
where Wage is measured in dollars per hour, and Male is a binary variable that is equal to 1 if the person
is a male and 0 if the person is a female. Define the wage gender gap as the difference in mean earnings
between men and women.
a. What is the estimated gender gap?
b. Is the estimated gender gap significantly different from 0?
(Compute the 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 for testing the null hypothesis that there is no gender gap.)
c. Construct a 95% confidence interval for the gender gap.
d. In the sample, what is the mean wage of women? Of men?
e. Another researcher uses these same data but regresses Wage on Female, a variable that is equal
to 1 if the person is female and 0 if the person a male. What are the regression estimates calculated
from this regression?
� = ______ + ______ ∙ 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝑅𝑅 2 = ______, 𝑆𝑆𝑆𝑆𝑆𝑆 = ______
𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊

Question T.7.2 (midterm 25-9-2014 Q3)


Consider the model 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 where 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 are observable and 𝑢𝑢𝑖𝑖 is an unknown error
term.
a. Suppose you do a simulation of this model with STATA, where you generate 50 observations using
x=0.1–rnormal() and y=0.8*x+rnormal() Next, you give the command regression
y x and look at the output. The slope is estimated by 0.92 with a standard error 0.14, and the
constant is estimated by 0.02 with a standard error 0.03.
i. What are the estimation errors?
ii. Test 𝐻𝐻0 : 𝛽𝛽1 = 0.9 versus 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 0.9 with 𝛼𝛼 = 5%. Is your conclusion correct? If not,
which type of error are you making?
iii. Same question as in ii. for testing 𝐻𝐻0 : 𝛽𝛽0 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽0 ≠ 0.
b. Suppose you have two independent OLS estimates of 𝛽𝛽1 , namely 𝑏𝑏𝐴𝐴 which is obtained using
𝑛𝑛𝐴𝐴 = 200 observations and 𝑏𝑏𝐵𝐵 which is obtained using 𝑛𝑛𝐵𝐵 = 50 observations. For both samples
the LS Assumptions are valid and the errors are homoscedastic. You consider combining the two
estimates by using 𝑏𝑏𝐶𝐶 = (𝑏𝑏𝐴𝐴 + 𝑏𝑏𝐵𝐵 )/2 .
i. Why are 𝑏𝑏𝐴𝐴 and 𝑏𝑏𝐵𝐵 normally distributed? Give the population mean and population
variance of 𝑏𝑏𝐴𝐴 and 𝑏𝑏𝐵𝐵 .
ii. Is 𝑏𝑏𝐶𝐶 also normally distributed? Derive the population mean and population variance of
𝑏𝑏𝐶𝐶 .
iii. Is 𝑏𝑏𝐶𝐶 an efficient estimator? Explain.

40
Question T.7.3..(S&W 5.7, modified)
Suppose (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) satisfy the three least squares assumptions in Key Concept 4.3. A random sample of size
𝑛𝑛 = 250 is drawn and yields
� = 5.4 + 3.2 X, 𝑅𝑅 2 = 0.26, 𝑆𝑆𝑆𝑆𝑆𝑆 = 6.2
Y
(3.1) (1.5) .
(between parentheses: homoskedasticity only standard errors)
a) Test if there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌. Use 𝛼𝛼 = 0.05 and the
assumption of homoskedasticity.
b) Construct a 95% confidence interval for 𝛽𝛽1
c) Suppose you learned that 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 were independent. Would you be surprised? Explain.
d) Suppose 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 are independent and many samples of size 𝑛𝑛 = 250 are drawn, the model is
estimated for each sample and and part a and b are answered. In what fraction of the samples
would the null hypothesis from part a be rejected? In what fraction of samples would the value
𝛽𝛽1 = 0 be included in the confidence interval from part b?

41

You might also like