Professional Documents
Culture Documents
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
Syllabus B
Version 1.0 October 2023
week 5-6-7
* * *
Week 5
S&W Ch.2.3 (p65-74, also 62)
S&W Ch.2.5 (p81-84)
S&W Ch.2.6 (p85-90)
S&W Ch.3.1 (p104-108), App3.2, p112-113 (in 3.2), p129 (in 3.7), (p694)
Week 6
S&W Ch.4, App4.2-4.3
Week 7
S&W Ch.5, App5.1 (and 18.1)
*
* *
2
Supplement 5
Deriving the normal equations
The least squares method was introduced in Chapter 4. The objective was to determine the sample regression
line:
𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋
That would minimize the sum of squared deviations 𝑆𝑆𝑆𝑆𝑆𝑆 between the points and the line. That is, the
method determines values for 𝛽𝛽̂0 and 𝛽𝛽̂1 such that
𝑛𝑛
2
𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 �
𝑖𝑖=1
We find the values of 𝛽𝛽̂0 and 𝛽𝛽̂1 that minimize 𝑆𝑆𝑆𝑆𝑆𝑆 is accomplished using differential calculus. We begin
by partially differentiating 𝑆𝑆𝑆𝑆𝑆𝑆 with respect to 𝛽𝛽̂0 and 𝛽𝛽̂1 , setting the partial derivatives equal to zero, and
solving the two equations:
𝑛𝑛
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
= −2 ∙ � �𝑌𝑌𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 �� = 0
𝜕𝜕𝛽𝛽̂0
𝑖𝑖=1
𝑛𝑛
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
= −2 ∙ � �𝑌𝑌𝑖𝑖 − �𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 �� 𝑋𝑋𝑖𝑖 = 0
𝜕𝜕𝛽𝛽̂1
𝑖𝑖=1
These two equations can now be reduced to what are called “the normal equations”:
𝑛𝑛 𝑛𝑛
3
Supplement 6
Summary – ANOVA table
Simple regression: 𝒌𝒌 = 𝟏𝟏 � 𝟎𝟎 + 𝜷𝜷
� = 𝜷𝜷
𝒀𝒀 � 𝟏𝟏 𝑿𝑿
ANOVA Sum of Squares using 𝑅𝑅 2 using 𝑠𝑠…2 df Mean Square using 𝑠𝑠…2
𝑛𝑛
2
2 𝑠𝑠𝑋𝑋𝑋𝑋 𝐸𝐸𝐸𝐸𝐸𝐸
Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = ��𝑌𝑌�𝑖𝑖 − 𝑌𝑌�� 2
= 𝑅𝑅 ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 = (𝑛𝑛 − 1) 2 1
𝑠𝑠𝑋𝑋 1
𝑖𝑖=1
𝑛𝑛 2
2 𝑠𝑠𝑋𝑋𝑋𝑋 𝑆𝑆𝑆𝑆𝑆𝑆
Residuals 𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 � = (1 − 𝑅𝑅 2 ) ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 = (𝑛𝑛 − 1) �𝑠𝑠𝑌𝑌2 − � 𝑛𝑛 − 2 = 𝑠𝑠𝑢𝑢�2 = 𝑆𝑆𝑆𝑆𝑅𝑅 2
𝑠𝑠𝑋𝑋2 𝑛𝑛 − 2
𝑖𝑖=1
𝑛𝑛
𝑇𝑇𝑇𝑇𝑇𝑇
Total 𝑇𝑇𝑇𝑇𝑇𝑇 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)2 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑆𝑆𝑆𝑆𝑆𝑆 = (𝑛𝑛 − 1)𝑠𝑠𝑌𝑌2 𝑛𝑛 − 1 = 𝑠𝑠𝑌𝑌2
𝑛𝑛 − 1
𝑖𝑖=1
𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 2
𝑠𝑠𝑋𝑋𝑋𝑋 (𝑛𝑛 − 2)𝑠𝑠𝑢𝑢�2
=1− = 𝑅𝑅 2 = =1−
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝑠𝑠𝑋𝑋2 𝑠𝑠𝑌𝑌2 (𝑛𝑛 − 1)𝑠𝑠𝑌𝑌2
Supplement 7
Formula sheet – Simple regression analysis (𝒌𝒌 = 𝟏𝟏)
𝐸𝐸(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐) = 𝑎𝑎 ∙ 𝜇𝜇𝑋𝑋 + 𝑏𝑏 ∙ 𝜇𝜇𝑌𝑌 + 𝑐𝑐 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
𝜎𝜎𝛽𝛽�2 =
𝑣𝑣𝑣𝑣𝑣𝑣(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐) = 𝑎𝑎 ∙ 2
𝜎𝜎𝑋𝑋2 + 𝑏𝑏 ∙ 2
𝜎𝜎𝑌𝑌2 + 2𝑎𝑎𝑎𝑎 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 1 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 )]2
𝑐𝑐𝑐𝑐𝑐𝑐(𝑎𝑎𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐, 𝑊𝑊) = 𝑎𝑎 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝑏𝑏 ∙ 𝜎𝜎𝑌𝑌𝑌𝑌 1 𝑣𝑣𝑣𝑣𝑣𝑣(𝐻𝐻𝑖𝑖 𝑢𝑢𝑖𝑖 )
𝜎𝜎𝛽𝛽�2 =
0 𝑛𝑛 �𝐸𝐸�𝐻𝐻 2 ��2
𝑖𝑖
𝐸𝐸(𝑌𝑌) = 𝐸𝐸�𝐸𝐸(𝑌𝑌|𝑋𝑋)�
𝜇𝜇𝑋𝑋
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 𝑥𝑥) = 𝐸𝐸( [𝑌𝑌 − 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 𝑥𝑥)]2 | 𝑋𝑋 = 𝑥𝑥) with 𝐻𝐻𝑖𝑖 = 1 − � � 𝑋𝑋𝑖𝑖
𝐸𝐸�𝑋𝑋𝑖𝑖2 �
5
Supplement 8
Overview of conditions for simple regression
Simple Linear Regression Model:
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 (𝑖𝑖 = 1, . . . , 𝑛𝑛)
conditions results
1. 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 LS coefficients are unbiased and consistent:
Least Squares estimation 2. (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) for 𝑖𝑖 = 1, . . . , 𝑛𝑛 are i.i.d. 𝑝𝑝
𝐸𝐸�𝛽𝛽̂1 � = 𝛽𝛽1 𝛽𝛽̂1 → 𝛽𝛽1
[Ch.4]
3. Large outliers are unlikely: 𝑝𝑝
𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 have nonzero finite fourth moments 𝐸𝐸�𝛽𝛽̂0 � = 𝛽𝛽0 𝛽𝛽̂0 → 𝛽𝛽0
(c) (d)
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
Exercise H.5.2
a. When you roll a common six-sided die the possible outcomes are 1, 2, 3, 4, 5, 6 eyes. Suppose
instead you use a six-sided die with an adapted number of eyes: 10, 20, 30, 40, 50, 60. When 𝑛𝑛
rolls are done, how will the mean, variance and standard deviation of the number of eyes change?
Derive the changes with formulas.
b. Suppose you use a six-sided die with an adapted number of eyes, which are: 10, 100, 1000, 10000,
100000, 1000000 (this is 101 102 103 104 105 106). Compared with the common die, how will the
sample mean, variance and standard deviation now change?
Exercise H.5.3
Suppose that for a population of individuals 𝑋𝑋 = 1 for a male and 𝑋𝑋 = 0 for a female, and 𝑌𝑌 = the
number of times the individual buys clothes in a month. The joint probability distribution of 𝑋𝑋 and 𝑌𝑌 is
given in the table below:
𝑌𝑌
0 1 2 total
0 .12 .21 .27 .6
𝑋𝑋
1 .20 .16 .04 .4
total .32 .37 .31 1
a. Calculate the fraction that buys no clothes in a month for (i) females and (ii) males.
b. Calculate 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 0).
c. Calculate 𝐸𝐸(𝑌𝑌) using the law of iterated expectations.
d. Check whether 𝑋𝑋 and 𝑌𝑌 are independent, using the results of the two previous questions.
7
Exercise H.5.4
The random variables 𝑋𝑋 and 𝑌𝑌 are yearly returns (in %) of investments funds A and B respectively. It is
known that 𝐸𝐸(𝑋𝑋) = 8, 𝐸𝐸(𝑌𝑌) = 6, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) = 47, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 31 and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 18.
One person invests €27,000 in fund A and €13,000 in fund B, and further keeps €10,000 on a bank account
with a fixed yearly return of 2%. The yield after one year of this person is then
𝑉𝑉 = 270𝑋𝑋 + 130𝑌𝑌 + 200
For another person who invests €8,000 in Fund A and €19,000 in fund B while keeping €15,000 on a bank
account with a fixed yearly return of 2%, the yield after one year is
𝑊𝑊 = 80𝑋𝑋 + 190𝑌𝑌 + 300
a. Calculate 𝐸𝐸(𝑉𝑉) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉).
b. Calculate 𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊).
c. Is 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) related to 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) ? Explain.
Exercise H.5.5
Suppose you want to use a random sample to estimate the population parameter 𝜃𝜃. For example, 𝜃𝜃
could be a population mean 𝜇𝜇 or a regression coefficient 𝛽𝛽𝑗𝑗 . You are considering to use either the
estimator 𝜃𝜃�𝐴𝐴 or the estimator 𝜃𝜃�𝐵𝐵 .
a. Suppose that 𝜃𝜃�𝐴𝐴 is an unbiased estimator, while 𝜃𝜃�𝐵𝐵 is biased. Draw an example of the probability
distributions of the two estimators, that illustrates the difference between unbiased and biased.
b. Suppose that 𝜃𝜃�𝐴𝐴 and 𝜃𝜃�𝐵𝐵 are both unbiased, while 𝜃𝜃�𝐴𝐴 is more efficient than 𝜃𝜃�𝐵𝐵 . Draw an
example of the probability distributions of the two estimators, that illustrates the difference in
efficiency.
c. Suppose that 𝜃𝜃�𝐴𝐴 is consistent, but 𝜃𝜃�𝐵𝐵 is not. Draw an example of the probability distribution of
𝜃𝜃�𝐴𝐴 for 𝑛𝑛 = 10, 100, 1000, 10000. Draw these four distributions in one graph that illustrates
consistency. Do the same for 𝜃𝜃�𝐵𝐵 thus illustrating inconsistency.
8
Exercise H.5.8 – Proof that the sample variance is unbiased and consistent
Assume that 𝑌𝑌1 , … , 𝑌𝑌𝑛𝑛 are i.i.d. with mean 𝜇𝜇𝑌𝑌 and variance 𝜎𝜎𝑌𝑌2 . Then the sample variance
𝑛𝑛 𝑛𝑛
1 1
𝑆𝑆𝑌𝑌2 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌̄)2 = �� 𝑌𝑌𝑖𝑖2 − 𝑛𝑛𝑌𝑌̄ 2 �
𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1
can be used as an estimator of the population variance 𝜎𝜎𝑋𝑋2 .
a. Show that 𝐸𝐸�𝑌𝑌𝑖𝑖2 � = 𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 (hint: use a familiar formula)
2
𝜎𝜎
b. In a similar fashion, show that 𝐸𝐸(𝑌𝑌� 2 ) = 𝑛𝑛𝑌𝑌 + 𝜇𝜇𝑌𝑌2
Exercise H.5.9 – Proof that the sample covariance is unbiased and consistent
Suppose the random variable pairs (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ) are i.i.d. Then the sample covariance
𝑛𝑛 𝑛𝑛
1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌��
𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1
can be used as an estimator of the population covariance 𝜎𝜎𝑋𝑋𝑋𝑋 .
a. Show that 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) = 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 (hint: use a familiar formula)
𝜎𝜎𝑋𝑋𝑋𝑋 𝜎𝜎
b. Show that 𝜎𝜎����
𝑋𝑋𝑋𝑋 = 𝑛𝑛
(hint: 𝑐𝑐𝑐𝑐𝑐𝑐�𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = ⋯ for 𝑖𝑖 ≠ 𝑗𝑗) and 𝐸𝐸(𝑋𝑋�𝑌𝑌�) = 𝑛𝑛𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
c. Prove that 𝑠𝑠𝑋𝑋𝑋𝑋 is an unbiased estimator of 𝜎𝜎𝑋𝑋𝑋𝑋
d. Assuming that large outliers are unlikely, prove that 𝑠𝑠𝑋𝑋𝑋𝑋 is a consistent estimator of 𝜎𝜎𝑋𝑋𝑋𝑋
Exercise H.5.10
A population consists for one half of zeros and for the other half of sixes, so that 𝜇𝜇 = 3 and 𝜎𝜎 2 = 9.
A random sample of size 3 is drawn from the population, yielding (𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 ). Consider three estimators of
1 2 2
the population mean: the sample mean 𝑋𝑋̄, 𝑌𝑌 = (𝑋𝑋1 + 𝑋𝑋2 )/3 and 𝑍𝑍 = 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 . 5 5 5
Exercise H.5.11
From the population distribution of the random variable 𝑋𝑋 with mean 𝜇𝜇 and variance 𝜎𝜎 2 , a random
sample of size 𝑛𝑛 = 3 is drawn. Consider 𝑌𝑌 = 0.6𝑋𝑋1 + 0.6𝑋𝑋2 − 0.2𝑋𝑋3 as an estimator of 𝜇𝜇.
a. Calculate the expected value of 𝑌𝑌.
b. Calculate the variance of 𝑌𝑌.
c. Is the estimator 𝑌𝑌 unbiased? Is it efficient, compared with the sample mean 𝑋𝑋̄ ?
d. Suppose now that you have a set of quantitative data 𝑋𝑋𝑖𝑖 (𝑖𝑖 = 1, … . , 𝑛𝑛) sampled randomly from a
1
population with expected value 𝜇𝜇. We already know of 𝑋𝑋� as estimator of 𝜇𝜇: 𝑋𝑋� = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 .
1
But here, investigate this particular estimator of μ: ∑𝑛𝑛 𝑋𝑋 . Is it unbiased? Is it consistent?
𝑛𝑛−1 𝑖𝑖=1 𝑖𝑖
(Assume 𝑋𝑋𝑖𝑖 has finite fourth moments).
Exercise H.5.12
Let 𝑋𝑋 and 𝑌𝑌 denote draws from a bivariate normal distribution with 𝐸𝐸(𝑋𝑋) = 𝐸𝐸(𝑌𝑌) = 𝜇𝜇 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) =
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 1. Suppose a covariance exists between 𝑋𝑋 and 𝑌𝑌 that is equal to –0.5, so that
𝐸𝐸�(𝑋𝑋 − 𝜇𝜇)(𝑌𝑌 − 𝜇𝜇)� = −12. Consider the following two estimators of 𝜇𝜇:
9
1 1 1 2
(i) 𝑋𝑋̄ = 𝑋𝑋 + 𝑌𝑌 (ii) 𝑋𝑋� = 𝑋𝑋 + 𝑌𝑌
2 2 3 3
a. Show that 𝐸𝐸�𝑋𝑋�� = 𝜇𝜇. To what characteristic does the property 𝐸𝐸�𝑋𝑋�� = 𝜇𝜇 refer to?
b. Determine the variance of both estimators, that is 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) and 𝑣𝑣𝑣𝑣𝑣𝑣�𝑋𝑋��.
c. Which estimator is the most efficient one?
10
Week 5
Homework solutions
Solution H.5.1
a.
𝑛𝑛 𝑛𝑛
�(𝑋𝑋𝑖𝑖 + 𝑌𝑌𝑖𝑖 ) = (𝑋𝑋1 + 𝑌𝑌1 ) + ⋯ + (𝑋𝑋𝑛𝑛 + 𝑌𝑌𝑛𝑛 ) = (𝑋𝑋1 + ⋯ + 𝑋𝑋𝑛𝑛 ) + (𝑌𝑌1 + ⋯ + 𝑌𝑌𝑛𝑛 ) = � 𝑋𝑋𝑖𝑖 + � 𝑌𝑌𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
c.
𝑛𝑛 𝑛𝑛
� 𝑎𝑎 = 𝑎𝑎 + ⋯ + 𝑎𝑎 = 𝑎𝑎(1 + ⋯ + 1) = 𝑎𝑎 � 1 = 𝑛𝑛𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
d. Using respectively the rules in b., c. and a., we get:
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
Solution H.5.2
a) Every possible outcome is multiplied by 10. Then the mean is also multiplied by 10, the variance is
multiplied by 102 = 100, and the standard deviation is multiplied by √100 = 10. This can be derived
as follows, where 𝑋𝑋𝑖𝑖 are the original outcomes and 𝑌𝑌𝑖𝑖 the adapted ones:
𝑌𝑌𝑖𝑖 = 10 ∙ 𝑋𝑋𝑖𝑖
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝑌𝑌̄ = 𝑛𝑛
� 𝑌𝑌𝑖𝑖 = 𝑛𝑛
� 10𝑋𝑋𝑖𝑖 = 10 ∙ 𝑛𝑛
� 𝑋𝑋𝑖𝑖 = 10 ∙ 𝑋𝑋̄
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1 1
𝑠𝑠𝑌𝑌2 = 𝑛𝑛−1
�(𝑌𝑌𝑖𝑖 − 𝑌𝑌̄)2 = 𝑛𝑛−1 �(10𝑋𝑋𝑖𝑖 − 10𝑋𝑋̄)2 = 𝑛𝑛−1 � 100(𝑋𝑋𝑖𝑖 − 𝑋𝑋̄)2 = 100 ∙ 𝑛𝑛−1 �(𝑋𝑋𝑖𝑖 − 𝑋𝑋̄)2 = 100𝑠𝑠𝑋𝑋2
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
If the question was about the population (instead of a sample of size 𝑛𝑛) then we would use:
b) In this case, the transformation is not linear: 𝑌𝑌𝑖𝑖 = 10 𝑋𝑋𝑖𝑖 . To calculate what happens with the sample
mean 𝑌𝑌̄, variance 𝑠𝑠𝑌𝑌2 and standard deviation 𝑠𝑠𝑌𝑌 , we need the actual 𝑛𝑛 observations.
For the same reason of nonlinearity of 𝑌𝑌 = 10 𝑋𝑋 we cannot use equations like 𝐸𝐸(𝑎𝑎𝑎𝑎 + 𝑏𝑏) = 𝑎𝑎𝑎𝑎(𝑋𝑋) +
𝑏𝑏 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑎𝑎𝑎𝑎 + 𝑏𝑏) = 𝑎𝑎2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋). Using 𝐸𝐸(𝑎𝑎 𝑋𝑋 ) = 𝑎𝑎𝐸𝐸(𝑋𝑋) will fail, because it is false.
We would have to calculate, for example:
6
𝑋𝑋 ) 1
𝐸𝐸(10 = � 10𝑥𝑥 𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = 6(10 + 100 + ⋯ + 1000000) = 185 185.19
𝑥𝑥=1
𝐸𝐸(𝑋𝑋)
�note: 10 = 103.5 = 3162.28 ≠ 𝐸𝐸(10 𝑋𝑋 ) = 185 185.19 so 𝐸𝐸(𝑎𝑎 𝑋𝑋 ) ≠ 𝑎𝑎𝐸𝐸(𝑋𝑋) �
11
Solution H.5.3
0.12
a. 𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 0) = = 0.2, so 20% for females
0.6
0.20
𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 1) = = 0.5, so 50% for males
0.4
0.12
b. 𝑃𝑃(𝑌𝑌 = 0 | 𝑋𝑋 = 0) = = 0.2
0.6
0.21
𝑃𝑃(𝑌𝑌 = 1 | 𝑋𝑋 = 0) = = 0.35
0.6
0.26
𝑃𝑃(𝑌𝑌 = 2 | 𝑋𝑋 = 0) = = 0.45
0.6
𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 0 ∙ 0.2 + 1 ∙ 0.35 + 2 ∙ 0.45 = 1.25
𝐸𝐸(𝑌𝑌 2 | 𝑋𝑋 = 0) = 02 ∙ 0.2 + 12 ∙ 0.35 + 22 ∙ 0.45 = 2.15
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌 | 𝑋𝑋 = 0) = 𝐸𝐸(𝑌𝑌 2 | 𝑋𝑋 = 0) − [𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0)]2 = 2.15 − 1.252 = 0.5875
c. 𝐸𝐸(𝑌𝑌) = 𝐸𝐸�𝐸𝐸(𝑌𝑌|𝑋𝑋)� = 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) ∙ 𝑃𝑃(𝑋𝑋 = 0) + 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) ∙ 𝑃𝑃(𝑋𝑋 = 1)
= 1.25 ∙ 0.6 + 0.6 ∙ 0.4 = 0.99
0.20 0.16 0.04
note: 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) = 0 ∙ + 1∙ +2∙ = 0.6
0.4 0.4 0.4
d. Dependent, because 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 1.25 is not equal to 𝐸𝐸(𝑌𝑌) = 0.99.
Solution H.5.4
a. 𝑉𝑉 = 270𝑋𝑋 + 130𝑌𝑌 + 200
𝐸𝐸(𝑉𝑉) = 270 ∙ 𝐸𝐸(𝑋𝑋) + 130 ∙ 𝐸𝐸(𝑌𝑌) + 200 = 270 ∙ 8 + 130 ∙ 6 + 200 = 3 140
𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉) = 2702 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 1302 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 × 270 × 130 × 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
= 2702 ∙ 47 + 1302 ∙ 31 + 2 ∙ 270 ∙ 130 ∙ 18 = 5 213 800
b. 𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = 𝑐𝑐𝑐𝑐𝑐𝑐(270 ∙ 𝑋𝑋 + 130 ∙ 𝑌𝑌 + 200, 80 ∙ 𝑋𝑋 + 190 ∙ 𝑌𝑌 + 300)
= 270 ∙ 80 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑋𝑋) + 270 ∙ 190 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) + 130 ∙ 80 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌, 𝑋𝑋) + 130 ∙ 190 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑌𝑌, 𝑌𝑌)
= 270 ∙ 80 ∙ 47 + 270 ∙ 190 ∙ 18 + 130 ∙ 80 ∙ 18 + 130 ∙ 190 ∙ 31 = 2 891 500
c. If V was a linear transformation of only 𝑋𝑋 and 𝑊𝑊 was a linear transformation of only 𝑌𝑌, then we
could immediately say that 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌). But that is not the case here.
Here, we need to calculate:
𝑊𝑊 = 80𝑋𝑋 + 190𝑌𝑌 + 300
𝑣𝑣𝑣𝑣𝑣𝑣(𝑊𝑊) = 802 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 1902 ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 × 80 × 190 × 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
= 802 ∙ 47 + 1902 ∙ 31 + 2 ∙ 80 ∙ 190 ∙ 18 = 1 967 100
𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑉𝑉, 𝑊𝑊) = = 0.903 ≠ 0.472 = = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌)
�𝑣𝑣𝑣𝑣𝑣𝑣(𝑉𝑉) ∙ �𝑣𝑣𝑣𝑣𝑣𝑣(𝑊𝑊) �𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) ∙ �𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌)
Solution H.5.5
a. b.
12
c.
Note that for consistency, it is not required that the estimator is unbiased; it is sufficient that the estimator
is asymptotically unbiased.
Solution H.5.6
a. Independently and identically distributed.
b. In case of bivariate cross-sectional data (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ), the 𝑛𝑛 objects for which the two
variables (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are measured are obtained by repeated independent random draws from the same
population, so that you get a sequence of i.i.d. pairs of random variables.
Time-series data belong to consecutive time periods (or moments) in time, which are thus not drawn
independently, and as a result the consecutive pairs of random variables (𝑋𝑋1 , 𝑌𝑌1 ), … , (𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 ) are likely
to be dependent.
c. We say that 𝑌𝑌 converges in probability to 𝑎𝑎. Formally it means: when the sample size 𝑛𝑛 increases to
infinity, then 𝑃𝑃(𝑎𝑎 − 𝑐𝑐 < 𝑌𝑌 < 𝑎𝑎 + 𝑐𝑐) converges to 1 for any given value 𝑐𝑐 > 0.
In words: when 𝑛𝑛 increases to infinity, the probability that 𝑌𝑌 will be arbitrarily close to 𝑎𝑎 (as close as
you like) increases to 1. Or: when 𝑛𝑛 increases to infinity, then asymptotically 𝑌𝑌 will deviate only an
arbitrarily small amount from 𝑎𝑎.
Effectively: when 𝑛𝑛 increases to infinity the probability distribution of 𝑌𝑌 will become fully
concentrated at 𝑎𝑎.
d. The statement is called the Law of Large Numbers. It says that the sample mean converges in
probability to the population mean. (Convergence in probability is explained in the previous question).
e. If 𝑌𝑌1 , … , 𝑌𝑌𝑛𝑛 are i.i.d. and each 𝑌𝑌𝑖𝑖 has mean 𝜇𝜇𝑌𝑌 and positive finite variance 𝜎𝜎𝑌𝑌2 , then for increasing
values of 𝑛𝑛 (up to infinity) the distribution of
𝑌𝑌� − 𝜇𝜇𝑌𝑌
𝜎𝜎𝑌𝑌
√𝑛𝑛
becomes arbitrarily well approximated by the standard normal distribution (and for increasing values of
𝑛𝑛 the distribution of 𝑌𝑌� becomes arbitrarily well approximated by the normal distribution with mean
𝜎𝜎
𝜇𝜇𝑌𝑌 and standard deviation 𝑌𝑌𝑛𝑛 ).
√
13
Solution H.5.7
a. (i) Interpretation: 𝑌𝑌� is an unbiased estimator of 𝜇𝜇𝑌𝑌 . Over repeated samples the estimated value is on
average correct, so there is no systematic over- or underestimation.
(ii) It is not implied that 𝑌𝑌� 2 has expected value (𝜇𝜇𝑌𝑌 )2 because an expected value can only be
transferred with linear functions, but not with nonlinear functions such as a quadratic function.
Consequently, 𝑌𝑌� 2 is not an unbiased estimator of (𝜇𝜇𝑌𝑌 )2 .
𝑝𝑝
b. (i) Interpretation of 𝑌𝑌� → 𝜇𝜇𝑌𝑌 : 𝑌𝑌� converges in probability to 𝜇𝜇𝑌𝑌 so when 𝑛𝑛 increases to infinity, then
asymptotically 𝑌𝑌� will deviate only an arbitrarily small amount from 𝜇𝜇𝑌𝑌 . This means that 𝑌𝑌� is a
consistent estimator of 𝜇𝜇𝑌𝑌 .
𝑝𝑝
(ii) It is indeed implied that 𝑌𝑌� 2 → (𝜇𝜇𝑌𝑌 )2 as continuous functions preserve convergence in probability,
and a quadratic function is indeed continuous. Consequently 𝑌𝑌� 2 is a consistent estimator of (𝜇𝜇𝑌𝑌 )2 .
Solution H.5.8
a. 𝜎𝜎𝑌𝑌2 = 𝐸𝐸�𝑌𝑌𝑖𝑖2 � − 𝜇𝜇𝑌𝑌2 ⟹ 𝐸𝐸�𝑌𝑌𝑖𝑖2 � = 𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2
2 2
𝜎𝜎 𝜎𝜎
b. 𝜎𝜎𝑌𝑌�2 = 𝐸𝐸(𝑌𝑌� 2 ) − 𝜇𝜇𝑌𝑌2� ⟹ 𝐸𝐸(𝑌𝑌� 2 ) = 𝜎𝜎𝑌𝑌�2 + 𝜇𝜇𝑌𝑌2� = 𝑌𝑌 + 𝜇𝜇𝑌𝑌2 because 𝜎𝜎𝑌𝑌�2 = 𝑌𝑌 and 𝜇𝜇𝑌𝑌� = 𝜇𝜇𝑌𝑌
𝑛𝑛 𝑛𝑛
c.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝐸𝐸(𝑆𝑆𝑌𝑌2 ) = 𝐸𝐸 �𝑛𝑛−1 �� 𝑌𝑌𝑖𝑖2 ̄2
− 𝑛𝑛𝑌𝑌 �� = �𝐸𝐸 �� 𝑌𝑌𝑖𝑖2 � − 𝐸𝐸(𝑛𝑛𝑌𝑌̄ 2 )� = �� 𝐸𝐸�𝑌𝑌𝑖𝑖2 �−𝑛𝑛𝑛𝑛(𝑌𝑌̄ 2 )�
𝑛𝑛−1 𝑛𝑛−1
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
1 𝜎𝜎𝑌𝑌2 1 1
= 𝑛𝑛−1 ��(𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 ) − 𝑛𝑛 � + 𝜇𝜇𝑌𝑌2 �� = 𝑛𝑛−1(𝑛𝑛𝑛𝑛𝑌𝑌2 + 𝑛𝑛𝜇𝜇𝑌𝑌2 − 𝜎𝜎𝑌𝑌2 − 𝑛𝑛𝜇𝜇𝑌𝑌2 ) = 𝑛𝑛−1 ∙ (𝑛𝑛 − 1)𝜎𝜎𝑌𝑌2 = 𝜎𝜎𝑌𝑌2
𝑛𝑛
𝑖𝑖=1
In turn,
𝑛𝑛 𝑛𝑛
1 𝑛𝑛 1 𝑝𝑝
𝑆𝑆𝑌𝑌2 = �� 𝑌𝑌𝑖𝑖2 − 𝑛𝑛𝑌𝑌̄ 2 � = � � 𝑌𝑌𝑖𝑖2 − 𝑌𝑌̄ 2 � �� 1 ∙ (𝜎𝜎𝑌𝑌2 + 𝜇𝜇𝑌𝑌2 − 𝜇𝜇𝑌𝑌2 ) = 𝜎𝜎𝑌𝑌2
𝑛𝑛−1 𝑛𝑛 − 1 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
since → 1 for 𝑛𝑛 → ∞ and continuous functions preserve convergence in probability.
𝑛𝑛−1
Solution H.5.9
a. 𝜎𝜎𝑋𝑋𝑋𝑋 = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) − 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ⟹ 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) = 𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
b. Since 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑗𝑗 are independent for 𝑖𝑖 ≠ 𝑗𝑗, we have 𝑐𝑐𝑐𝑐𝑐𝑐�𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = 0 for 𝑖𝑖 ≠ 𝑗𝑗 so that
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1 1 1 1 𝜎𝜎𝑋𝑋𝑋𝑋
𝜎𝜎𝑋𝑋𝑋𝑋
���� = 𝑐𝑐𝑐𝑐𝑐𝑐 �𝑛𝑛 � 𝑋𝑋𝑖𝑖 , 𝑛𝑛 � 𝑌𝑌𝑗𝑗 � = ∙ ∙ 𝑐𝑐𝑐𝑐𝑐𝑐 �� 𝑋𝑋𝑖𝑖 , � 𝑌𝑌𝑗𝑗 � = 𝑛𝑛2
� 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) = 𝑛𝑛2 ∙ 𝑛𝑛 ∙ 𝜎𝜎𝑋𝑋𝑋𝑋 =
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1
As a result,
𝜎𝜎𝑋𝑋𝑋𝑋
���� = 𝐸𝐸(𝑋𝑋
𝜎𝜎𝑋𝑋𝑋𝑋 � 𝑌𝑌�) − 𝜇𝜇𝑋𝑋� 𝜇𝜇𝑌𝑌� ⟹ 𝐸𝐸(𝑋𝑋�𝑌𝑌�) = 𝜎𝜎𝑋𝑋𝑋𝑋
���� + 𝜇𝜇𝑋𝑋� 𝜇𝜇𝑌𝑌� = + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌
𝑛𝑛
because 𝜇𝜇𝑋𝑋� = 𝜇𝜇𝑋𝑋 and 𝜇𝜇𝑌𝑌� = 𝜇𝜇𝑌𝑌 .
14
c.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝐸𝐸(𝑠𝑠𝑋𝑋𝑋𝑋 ) = 𝐸𝐸 �𝑛𝑛−1 �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌��� = 𝑛𝑛−1 �𝐸𝐸 �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 � − 𝐸𝐸(𝑛𝑛𝑋𝑋�𝑌𝑌�)� = 𝑛𝑛−1 �� 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 ) − 𝑛𝑛𝑛𝑛(𝑋𝑋�𝑌𝑌�)�
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
1 𝜎𝜎𝑋𝑋𝑋𝑋 1 (𝑛𝑛−1)𝜎𝜎
��(𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) − 𝑛𝑛 � + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 �� = 𝑛𝑛−1(𝑛𝑛𝑛𝑛𝑋𝑋𝑋𝑋 + 𝑛𝑛𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 − 𝜎𝜎𝑋𝑋𝑋𝑋 − 𝑛𝑛𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) = 𝑛𝑛−1 𝑋𝑋𝑋𝑋 = 𝜎𝜎𝑋𝑋𝑋𝑋
𝑛𝑛−1 𝑛𝑛
𝑖𝑖=1
In turn,
𝑛𝑛 𝑛𝑛
1 𝑛𝑛 1 𝑃𝑃
𝑆𝑆𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋�𝑌𝑌�� = � � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑋𝑋�𝑌𝑌�� �� 1 ∙ (𝜎𝜎𝑋𝑋𝑋𝑋 + 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 − 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 ) = 𝜎𝜎𝑋𝑋𝑋𝑋
𝑛𝑛−1 𝑛𝑛 − 1 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
since → 1 for 𝑛𝑛 → ∞ and continuous functions preserve convergence in probability.
𝑛𝑛−1
Solution H.5.10
a. 𝐸𝐸(𝑋𝑋̄) = 𝜇𝜇 is known, so the bias is 0.
𝐸𝐸(𝑋𝑋1 )+𝐸𝐸(𝑋𝑋2 ) 𝜇𝜇+𝜇𝜇 2 2 1
𝐸𝐸(𝑌𝑌) =
3
= 3 = 𝜇𝜇 so the bias is 𝜇𝜇 − 𝜇𝜇 = − 𝜇𝜇
3 3 3
1 2 2 1 2 2
𝐸𝐸(𝑍𝑍) = 𝐸𝐸(𝑋𝑋1 ) + 𝐸𝐸(𝑋𝑋2 ) + 𝐸𝐸(𝑋𝑋3 ) = 𝜇𝜇 + 𝜇𝜇 + 𝜇𝜇 = 𝜇𝜇 so the bias is 0.
5 5 5 5 5 5
1 2 1 1 2 2 2 2 2
𝑣𝑣𝑣𝑣𝑣𝑣(𝑍𝑍) = 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋1 � + 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋2 � + 𝑣𝑣𝑣𝑣𝑣𝑣 �5𝑋𝑋3 � = �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋1 ) + �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋2 ) + �5� 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋3 )
1 4 4 9
= 25𝜎𝜎 2 + 25𝜎𝜎 2 + 25𝜎𝜎 2 = 25𝜎𝜎 2
1
3
𝜎𝜎 2 < 25
9 2
𝜎𝜎 so 𝑋𝑋̄ is more efficient than 𝑍𝑍.
Solution H.5.11
a. 𝐸𝐸(𝑌𝑌) = 0.6 ∙ 𝐸𝐸(𝑋𝑋1 ) + 0.6 ∙ 𝐸𝐸(𝑋𝑋2 ) − 0.2 ∙ 𝐸𝐸(𝑋𝑋3 ) = 0.6𝜇𝜇 + 0.6𝜇𝜇 − 0.2𝜇𝜇 = 𝜇𝜇
b. 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 𝑣𝑣𝑣𝑣𝑣𝑣(0.6 ∙ 𝑋𝑋1 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(0.6 ∙ 𝑋𝑋2 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(−0.2 ∙ 𝑋𝑋3 ) since the three terms are mutually
independent.
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0. 62 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋1 ) + 0. 62 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋2 ) + (−0.2)2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋3 ) = 0.36𝜎𝜎 2 + 0.36𝜎𝜎 2 + 0.04𝜎𝜎 2 = 0.76𝜎𝜎 2
c. 𝑌𝑌 is unbiased, since 𝐸𝐸(𝑌𝑌) = 𝜇𝜇.
𝑌𝑌 is not efficient compared with 𝑋𝑋̄ since 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0.76𝜎𝜎 2 > 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) = 13𝜎𝜎 2
d.
𝑛𝑛 𝑛𝑛
1 1 1 1 𝑛𝑛
𝐸𝐸 � � 𝑋𝑋𝑖𝑖 � = � 𝐸𝐸(𝑋𝑋𝑖𝑖 ) = (𝜇𝜇 + ⋯ + 𝜇𝜇) = 𝑛𝑛 ⋅ 𝜇𝜇 = 𝑛𝑛−1 ⋅ 𝜇𝜇
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖−1 𝑖𝑖−1
≠ 𝜇𝜇 so it is biased.
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 2 1
𝑣𝑣𝑣𝑣𝑣𝑣 � � 𝑋𝑋𝑖𝑖 � = � � ∙ 𝑣𝑣𝑣𝑣𝑣𝑣 �� 𝑋𝑋𝑖𝑖 � = ∙ � 𝑣𝑣𝑣𝑣𝑣𝑣( 𝑋𝑋𝑖𝑖 )
𝑛𝑛 − 1 𝑛𝑛 − 1 (𝑛𝑛 − 1)2
𝑖𝑖−1 𝑖𝑖−1 𝑖𝑖−1
15
1 1
= 2
∙ (𝜎𝜎 2 +. . . +𝜎𝜎 2 ) = ∙ 𝑛𝑛 ⋅ 𝜎𝜎 2
(𝑛𝑛 − 1) (𝑛𝑛 − 1)2
𝑛𝑛 1
When 𝑛𝑛 increases to infinity, then the expected value ⋅ 𝜇𝜇 = 1 ∙ 𝜇𝜇 converges to 𝜇𝜇 while the
𝑛𝑛−1 1−
𝑛𝑛
1 1
variance ∙ 𝑛𝑛 ⋅ 𝜎𝜎 2 = 1 ∙ 𝜎𝜎 2 converges to 0. This means that the estimator is consistent –
(𝑛𝑛−1)(𝑛𝑛−1) (𝑛𝑛−1)(1− )
𝑛𝑛
even though it is biased.
Solution H.5.12
1 2 1 2
a) 𝐸𝐸�𝑋𝑋�� = 𝐸𝐸 � 𝑋𝑋 + 𝑌𝑌� = 𝜇𝜇 + 𝜇𝜇 = 𝜇𝜇 so 𝑋𝑋� is an unbiased estimator
3 3 3 3
1 2 1 2
b) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋̄) = �2� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + �2� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 ∙ 12 ∙ 12 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 14 ∙ 1 + 14 ∙ 1 − 14 = 14
1 2 2 2 1 2 1 4 2 3
𝑣𝑣𝑣𝑣𝑣𝑣�𝑋𝑋�� = �3� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + �3� ∙ 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) + 2 ∙ 3 ∙ 3 ∙ 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌) = 9 ∙ 1 + 9 ∙ 1 − 9 = 9
16
Week 5
Tutorial exercises
Question T.5.1
An airline company has a file containing specific information with respect to all their customers. Based on
that file, the following relative frequency table is determined, where for a given year:
𝑋𝑋 = the number of private flights of the customer
𝑌𝑌 = the number of business flights of the customer
𝑌𝑌
0 1 2 3 total
0 0.00 0.24 0.11 0.05 0.40
𝑋𝑋 1 0.23 0.15 0.08 0.04 0.50
2 0.07 0.02 0.01 0.00 0.10
total 0.30 0.41 0.20 0.09 1.00
Assume this table represents the population distribution of (𝑋𝑋, 𝑌𝑌) when a random customer is drawn. One
can calculate from the table:
𝐸𝐸(𝑋𝑋) = 0.7 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) = 0.41 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = −0.246
𝐸𝐸(𝑌𝑌) = 1.08 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = 0.8536 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 0) = 1.525 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) = 0.86
a. Give the probability distribution of the number of business flights, when it is given that the
customer took 2 private flights.
b. Are 𝑋𝑋 and 𝑌𝑌 independent? Show it.
c. Determine the expected number of business flights, when it is given that the customer has 2
private flights. (Always use the proper notation!). How does your answer provide information
about (in)dependence of 𝑋𝑋 and 𝑌𝑌 ?
d. Determine the standard deviation of the number of business flights, when it is given that the
customer took 2 private flights.
e. Show the calculation of 𝐸𝐸(𝑌𝑌) according to the law of iterated expectations.
Suppose the revenue generated by the private flights of a customer is 𝐺𝐺 = 142 ∙ 𝑋𝑋 − 21 euro, while for
the business flights this revenue is 𝐻𝐻 = 197 ∙ 𝑌𝑌 − 34 euro.
f. Calculate 𝑣𝑣𝑣𝑣𝑣𝑣(𝐺𝐺) and 𝑣𝑣𝑣𝑣𝑣𝑣(𝐻𝐻).
g. Calculate the 𝑐𝑐𝑐𝑐𝑐𝑐(𝐺𝐺, 𝐻𝐻).
h. Calculate the 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝐺𝐺, 𝐻𝐻) and compare it with 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋, 𝑌𝑌).
Question T.5.2
a. Suppose the random variables 𝑋𝑋 and 𝑌𝑌 are uncorrelated. Are 𝑋𝑋 and 𝑌𝑌 independent?
b. Is the sample variance 𝑆𝑆 2 of i.i.d. variables an unbiased and consistent estimator of the population
variance 𝜎𝜎 2 when large outliers are unlikely? (for a proof: see H.5.8) Express both properties with
formulas and interpret.
c. Is it implied that 𝑆𝑆 is an unbiased and consistent estimator of 𝜎𝜎 ?
d. Is the sample covariance 𝑆𝑆𝑋𝑋𝑋𝑋 of i.i.d. pairs of variables an unbiased and consistent estimator of the
population covariance 𝜎𝜎𝑋𝑋𝑋𝑋 when large outliers are unlikely? (for a proof: see H.5.9) And is the
17
sample correlation 𝑟𝑟𝑋𝑋𝑋𝑋 then an unbiased and consistent estimator of the population covariance
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) ?
Question T.5.3
Consider the model 𝑌𝑌𝑖𝑖 = 𝛼𝛼 + 𝑢𝑢𝑖𝑖 where 𝛼𝛼 is a parameter and 𝑢𝑢𝑖𝑖 is a random error term. (This is a linear
regression model with only a constant) Assume that 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2 and that the errors are
independent.
a. Determine the least squares estimator of 𝛼𝛼.
b. Prove that the least squares estimator of 𝛼𝛼 is unbiased.
c. Calculate the (population) variance of the least squares estimator of 𝛼𝛼.
d. Show that the least squares estimator of 𝛼𝛼 is consistent.
e. What is the distribution of the least squares estimator of 𝛼𝛼 ?
(𝑌𝑌 + 𝑌𝑌 )
f. Consider the alternative estimator: 𝑎𝑎 ∗ = 1 2 2 . Prove it is unbiased and calculate its population
variance. Is this estimator consistent and efficient?
g. Is unbiased a stronger property than consistent? Or vice versa?
Question T.5.4
Suppose the following commands are given in STATA in an empty data file:
set obs 1000
generate x=2+4*rnormal()
gen y =3+5*x+7*rnormal()
These commands generate a random sample of (𝑥𝑥, 𝑦𝑦).
a. What is the population distribution of 𝑋𝑋 ?
b. What is the population distribution of 𝑌𝑌 ?
c. What is the population covariance of 𝑋𝑋 and 𝑌𝑌?
d. What type of distribution does (𝑋𝑋, 𝑌𝑌) have?
18
Week 6
Homework exercises
Exercise H.6.1 (continued in H.7.1)
Fifteen observations were taken to estimate a simple regression model. These summations were produced:
� 𝑋𝑋𝑖𝑖 = 50 � 𝑋𝑋𝑖𝑖2 = 250 � 𝑌𝑌𝑖𝑖 = 100 � 𝑌𝑌𝑖𝑖2 = 1100 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 500
19
she exercised and she also measured his or her cholesterol levels. The results have been analyzed in SPSS
using the following variable:
EXERCISE = weekly exercises in minutes
BEFORE = cholesterol level before exercise program
AFTER = cholesterol level after exercise program
The reduction in cholesterol is calculated as: RED = BEFORE − AFTER
REGRESSION
Variables Entered/Removedb
Variables Variables
Model Entered Removed Method
1 EXERCISEa . Enter
a. All requested variables entered.
b. Dependent Variable: RED
Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression ..... 1 ..... ..... .000a
Residual ..... 48 .....
Total ..... 49
a. Predictors: (Constant), EXERCISE
b. Dependent Variable: RED
a
Coefficients
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) ..... 3.939 ..... .605
EXERCISE ..... ..... .714 ..... .000
a. Dependent Variable: RED
CORRELATIONS
Descriptive Statistics
20
Correlations
EXERCISE RED
EXERCISE Pearson Correlation 1.000 .714**
Sig. (2-tailed) . .000
Sum of Squares and
668424.02 60789.400
Cross-products
Covariance 13641.307 1240.600
N 50 50
RED Pearson Correlation .714** 1.000
Sig. (2-tailed) .000 .
Sum of Squares and
60789.400 10850.000
Cross-products
Covariance 1240.600 221.429
N 50 50
**. Correlation is significant at the 0.01 level (2-tailed).
21
d) What does the regression predict will be the expenditure of a person with an income of €100? With
an income of €200?
e) Will the regression give reliable predictions for a person with an income of €2 million? Why or why
not?
f) Given what you know about the distribution of earnings, do you think it is plausible that the distri-
bution of errors in the regression is normal? (Hint: Do you think that the distribution is symmetric
or skewed? What is the smallest value of earnings, and is it consistent with a normal distribution?)
a) Explain what the term 𝑢𝑢𝑖𝑖 represents. Why will different participants have different values of 𝑢𝑢𝑖𝑖 ?
b) What is 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) ? Are the estimated coefficients unbiased?
c) The estimated regression is
𝑌𝑌𝑖𝑖 = 55 + 0.17𝑋𝑋𝑖𝑖
Compute the estimated gain in score for a participant who is given an additional 5 minutes to nap.
22
which satisfies the LS assumptions. Unfortunately, the role of the two variables was confused and
the following sample regression function was estimated using the least squares estimator:
𝑋𝑋𝑖𝑖 = 𝛾𝛾0 + 𝛾𝛾1 𝑌𝑌𝑖𝑖 + 𝜈𝜈𝑖𝑖
1 1
Show with a formula how �1
is calculated and find the value to which �1
converges in probability.
𝛾𝛾 𝛾𝛾
1
Also prove that �1
is an inconsistent estimator of 𝛽𝛽1 .
𝛾𝛾
23
Week 6
Homework solutions
Formulas
Sample variances and sample covariances can be calculated by their defining formulas:
1 1
𝑠𝑠𝑋𝑋2 =∙ �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 and 𝑠𝑠𝑋𝑋𝑋𝑋 = ∙ �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)
𝑛𝑛 − 1 𝑛𝑛 − 1
and by short-cut formulas which are often more convenient:
1 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − 𝑛𝑛 𝑋𝑋� 2 � and 𝑠𝑠𝑋𝑋𝑋𝑋 = ∙ �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛 𝑋𝑋� 𝑌𝑌��
𝑛𝑛 − 1 𝑛𝑛 − 1
Solution H.6.1
� 𝑋𝑋𝑖𝑖 = 50 � 𝑋𝑋𝑖𝑖2 = 250 � 𝑌𝑌𝑖𝑖 = 100 � 𝑌𝑌2𝑖𝑖 = 1100 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 500
a)
1 1 2 1 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − �� 𝑋𝑋𝑖𝑖 � � = �250 − 502 � = 5.952
𝑛𝑛 − 1 𝑛𝑛 14 15
1 1 1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − � 𝑋𝑋𝑖𝑖 � 𝑌𝑌𝑖𝑖 � = �500 − 50 ∙ 100� = 11.905
𝑛𝑛 − 1 𝑛𝑛 14 15
𝑠𝑠
𝛽𝛽̂1 = 𝑠𝑠𝑋𝑋𝑋𝑋
2 = 2.000 is the slope of the sample regression line
𝑋𝑋
1 50 1 100
𝑋𝑋� = � 𝑋𝑋𝑖𝑖 = = 3.333 and 𝑌𝑌� = � 𝑌𝑌𝑖𝑖 = = 6.667
𝑛𝑛 15 𝑛𝑛 15
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 0 is the intercept of the sample regression line
Sample regression line: 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 2𝑋𝑋
b) 𝑌𝑌� = 2 ∙ 2.9 = 5.8
c)
1 1 2 1 1
𝑠𝑠𝑌𝑌2 = ∙ �� 𝑌𝑌𝑖𝑖 2 − �� 𝑌𝑌𝑖𝑖 � � = �1100 − 1002 � = 30.952
𝑛𝑛 − 1 𝑛𝑛 14 15
2
𝑠𝑠𝑋𝑋𝑋𝑋 11.9052
𝑆𝑆𝑆𝑆𝑆𝑆 = (𝑛𝑛 − 1) �𝑠𝑠𝑌𝑌2 − � = 14 ∙ �30.952 − � = 100.0
𝑠𝑠𝑋𝑋2 5.952
𝑆𝑆𝑆𝑆𝑆𝑆 100.0
𝑠𝑠𝑢𝑢�2 = = = 7.692 ⟹ 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑢𝑢� = �𝑠𝑠𝑢𝑢�2 = 2.774
𝑛𝑛 − 2 13
The standard error of the regression (𝑺𝑺𝑺𝑺𝑺𝑺) is an estimator of the standard deviation of the
regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .
d)
1
𝑠𝑠𝑌𝑌2 = ∙ 𝑇𝑇𝑇𝑇𝑇𝑇 ⟹ 𝑇𝑇𝑇𝑇𝑇𝑇 = 14 ∙ 30.952 = 433.328
𝑛𝑛−1
24
e)
ANOVA Sum of Squares df Mean Square
Explained 𝐸𝐸𝐸𝐸𝐸𝐸 = 333.328 1
𝑅𝑅 2 𝑅𝑅 2 = 0.769
Solution H.6.2
� 𝑋𝑋𝑖𝑖 = 67 � 𝑋𝑋𝑖𝑖2 = 659 � 𝑌𝑌𝑖𝑖 = 5924 � 𝑌𝑌𝑖𝑖2 = 4 671 440 � 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 = 54 559
a) So
1 1
𝑠𝑠𝑋𝑋𝑋𝑋 𝑛𝑛−1 ∙ �∑ 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛 ∑ 𝑋𝑋𝑖𝑖 ∑ 𝑌𝑌𝑖𝑖 � 54 559 − 18 ∙ 67 ∙ 5924 4945.5
𝛽𝛽̂1 = 2 = 1 = = = 50.529
𝑠𝑠𝑋𝑋 1
𝑛𝑛−1
∙�∑ 𝑋𝑋𝑖𝑖2 − (∑ 𝑋𝑋𝑖𝑖 )2 �
𝑛𝑛
659 − 18 ∙ 672 97.875
1 1 1 1
𝛽𝛽̂0 = 𝑛𝑛 � 𝑦𝑦 − 𝑏𝑏1 ∙ 𝑛𝑛 � 𝑋𝑋𝑖𝑖 = 8 ∙ 5924 − 50.529 ∙ 8 ∙ 67 = 317.32
Regression line: 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 317.32 − 50.529 ∙ 𝑋𝑋 where 𝑌𝑌 ∶ #customers and 𝑋𝑋 ∶ #ads
The estimated intercept 𝛽𝛽̂0 is 3017.32 customers. The estimated slope 𝛽𝛽̂1 is 50.529 customers per
ad.
b)
1 1 2 1 1
𝑠𝑠𝑌𝑌2 = ∙ �� 𝑌𝑌𝑖𝑖2 − �� 𝑌𝑌𝑖𝑖 � � = �4 671 440 − 59242 � = 40 674
𝑛𝑛 − 1 𝑛𝑛 7 8
1 1 1
𝑠𝑠𝑋𝑋𝑋𝑋 = �� 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − � 𝑋𝑋𝑖𝑖 � 𝑌𝑌𝑖𝑖 � = ∙ 4945.5 = 706.5
𝑛𝑛 − 1 𝑛𝑛 7
1 1 2 1
𝑠𝑠𝑋𝑋2 = ∙ �� 𝑋𝑋𝑖𝑖2 − �� 𝑋𝑋𝑖𝑖 � � = ∙ 97.875 = 13.982
𝑛𝑛 − 1 𝑛𝑛 7
2
𝑆𝑆𝑆𝑆𝑆𝑆 𝑛𝑛 − 1 2 𝑠𝑠𝑋𝑋𝑋𝑋 7 706.52
𝑠𝑠𝑢𝑢�2 = = �𝑠𝑠𝑌𝑌 − 2 � = �40 674 − � = 5804.26 ⟹ 𝑠𝑠𝑢𝑢� = �𝑠𝑠𝑢𝑢�2 = 76.19
𝑛𝑛 − 2 𝑛𝑛 − 2 𝑠𝑠𝑋𝑋 6 97.875
The standard error of the regression (𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑢𝑢� ) is an estimator of the standard deviation of the
regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .
c)
𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 (𝑛𝑛 − 2) ∙ 𝑠𝑠𝑢𝑢2� 6 ∙ 5804.26
𝑅𝑅 2 = =1− =1− 2 =1− = 0.878
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 (𝑛𝑛 − 1) ∙ 𝑠𝑠𝑦𝑦 7 ∙ 40 674
The regression 𝑅𝑅 2 is the fraction of the sample variance 𝑠𝑠𝑦𝑦2 explained by (or predicted by) 𝑋𝑋
d)
𝑅𝑅 2 𝑅𝑅 2 = 0.878
25
Solution H.6.3
EXERCISE = weekly exercises in minutes
BEFORE = cholesterol level before exercise program
AFTER = cholesterol level after exercise program
The reduction in cholesterol is calculated as: RED = BEFORE − AFTER
a)
𝑠𝑠𝑋𝑋𝑋𝑋 1240.600 ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)(𝑌𝑌𝑖𝑖 − 𝑌𝑌�) 60 789.400
𝛽𝛽̂1 = ̂
2 = 13641.307 = 0.0909444 or 𝛽𝛽1 = ∑(𝑋𝑋𝑖𝑖 − 𝑋𝑋�) 2
=
668 424.020
= 0.0909444
𝑠𝑠𝑋𝑋
For each additional minute of exercise, cholesterol is reduced, on average by 0.09094.
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 27.8000 − 0.0909444 ∙ 283.1400 = 2.050
� = 2.050 + 0.09094 ∙ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
𝑅𝑅𝑅𝑅𝑅𝑅
b)
𝑅𝑅 2 𝑅𝑅 2 = 0.510
c) Hence
2
𝑅𝑅 2 = 𝑟𝑟𝑋𝑋𝑋𝑋 = 0.7142 = 0.510
51% is the fraction of the variation of the variable RED that is explained by the variation of the variable
EXERCISE.
𝑆𝑆𝑆𝑆𝑆𝑆 5316.5
𝑠𝑠𝑢𝑢� = � =� = 10.5293
𝑛𝑛 − 2 48
𝑠𝑠𝑢𝑢� is an estimator of the standard deviation of the regression error 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 .
Solution H.6.4
The sample size 𝑛𝑛 = 100. The estimated regression equation is
� = −79.24 + 4.16 ∙ Height
Weıght
𝑅𝑅 2 = 0.72, 𝑆𝑆𝑆𝑆𝑆𝑆 = 12.6
a. Substituting Height = 64, 68, and 72 inches into the equation, the predicted weights are
−79.24 + 4.16 ∙ 64 = 187 pounds, −79.24 + 4.16 ∙ 68 = 203.64 pounds and
−79.24 + 4.16 ∙ 72 = 220.28 pounds
� = 4.16 ∙ ∆Height = 4.16 ∙ 2 = 8.32 pounds
b. ∆Weıght
c. First, we rewrite the original estimated regression equation as
𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋,
and express the estimated regression equation in the centimeter-kilogram space as
𝑌𝑌�𝑘𝑘𝑘𝑘 = 𝛾𝛾�0 + 𝛾𝛾�1 𝑋𝑋𝑐𝑐𝑐𝑐
26
Using the conversion of measurements 𝑌𝑌𝑘𝑘𝑘𝑘 = 0.4536 ∙ 𝑌𝑌 and 𝑋𝑋𝑐𝑐𝑐𝑐 = 2.54 ∙ 𝑋𝑋.in the LS formulas for the
slope and intercept gives
2
⎛ 𝑠𝑠𝑋𝑋𝑐𝑐𝑐𝑐 𝑌𝑌𝑘𝑘𝑘𝑘 ⎞ 2.54 ∙ 0.4536 ∙ 𝑠𝑠𝑋𝑋𝑋𝑋
2
𝑅𝑅𝑐𝑐𝑐𝑐;𝑘𝑘𝑘𝑘 = 𝑟𝑟𝑋𝑋2𝑐𝑐𝑐𝑐𝑌𝑌𝑘𝑘𝑘𝑘 = ⎜ ⎟ =�
2
� = 𝑟𝑟𝑋𝑋𝑋𝑋 = 𝑅𝑅 2
�𝑠𝑠 2𝑋𝑋𝑐𝑐𝑐𝑐 ∙ �𝑠𝑠 2𝑌𝑌𝑘𝑘𝑘𝑘 √2.542 ∙ �𝑠𝑠𝑋𝑋2 ∙ √0.45362 ∙ �𝑠𝑠𝑌𝑌2
⎝ ⎠
and therefore remains at 0.72.
The residuals can be expressed as
0.4536
𝑌𝑌𝑘𝑘𝑘𝑘 − 𝛾𝛾�0 − 𝛾𝛾�1 𝑋𝑋𝑐𝑐𝑐𝑐 = 0.4536 ∙ 𝑌𝑌 − 0.4536 ∙ 𝛽𝛽̂0 − � � ∙ 𝛽𝛽̂1 ∙ 2.54 ∙ 𝑋𝑋 = 0.4536 ∙ �𝑌𝑌 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋�
2.54
and therefore
𝑆𝑆𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐,𝑘𝑘𝑘𝑘 = �0.45362 ∙ 𝑆𝑆𝑆𝑆𝑆𝑆 = 0.4536 ∙ 12.6 = 5.71536.
Solution H.6.5
a) The coefficient 8.8 shows the marginal effect of 𝐴𝐴𝐴𝐴𝐴𝐴 on the 𝐴𝐴𝐴𝐴𝐴𝐴; that is, 𝐴𝐴𝐴𝐴𝐴𝐴 is expected to increase
by €8.8 for each additional expense.
The intercept of the regression line is 710.7. It determines the overall level of the line. The interpreta-
tion is that a worker with €0 average monthly income has predicted average monthly expenditures of
€710.7. This interpretation is not sensible, however, since a monthly income of €0 is not realistic. More-
over, prediction is not safe for a given price outside the range of observed monthly income (€100 to
€1.5 million), and a a monthly income of €0 is outside this range.
b) 𝑆𝑆𝑆𝑆𝑆𝑆 is in the same units as the dependent variable (𝑌𝑌, or 𝐴𝐴𝐴𝐴𝐴𝐴 in this example). Thus, 𝑆𝑆𝑆𝑆𝑆𝑆 is measured
in euros per month.
c) 𝑅𝑅 2 is unit free
d) Substituting 𝐴𝐴𝐴𝐴𝐴𝐴 = 100 and 𝐴𝐴𝐴𝐴𝐴𝐴 = 200 into the equation, the predicted monthly expenditure are
710.7 + 8.8 ∙ 100 = € 1590.7 and 710.7 + 8.8 ∙ 200 = € 2470.7, respectively.
e) No. The highest income in the sample is €1.5 million, so €2 million is far outside the range of the sample
data.
f) No. The distribution of earning is positively skewed and has kurtosis larger than the normal.
Solution H.6.6
a) The error term 𝑢𝑢𝑖𝑖 represents factors other than time that influence the participant’s performance on
the test including inherent cognitive ability and aptitude. Some participants may have better memories
and some might have weaker ones
b) Because of random assignment, 𝑢𝑢𝑖𝑖 is independent of 𝑋𝑋𝑖𝑖 . Since 𝑢𝑢𝑖𝑖 represents deviations from average
𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0. . Because 𝑢𝑢𝑖𝑖 is independent of 𝑋𝑋𝑖𝑖 . for all 𝑖𝑖 = 1, … , 𝑛𝑛, we have 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0.
This means that the estimated coefficients will be unbiased.
27
c) The estimated gain in score equals
� = 0.17 ∙ ∆X = 0.17 ∙ 5 = 0.85.
∆Y
Solution H.6.7
When you do OLS with the data of 𝐷𝐷𝑖𝑖 and 𝐹𝐹𝑖𝑖 , you get the following.
𝑠𝑠𝐹𝐹𝐹𝐹 𝑠𝑠32+1.8𝐶𝐶,1.25𝐸𝐸 1.8 ∙ 1.25 ∙ 𝑠𝑠𝐶𝐶𝐶𝐶 1.25 1.25
𝛾𝛾�1 = ̂1 =
2 = 2 = 2 = ∙ 𝛽𝛽 ∙ 2.16 = 1.5
𝑠𝑠𝐹𝐹 𝑠𝑠32+1.8𝐶𝐶 1. 82 ∙ 𝑠𝑠𝐶𝐶 1.8 1.8
𝛾𝛾�0 = 𝐷𝐷̄ − 𝛾𝛾�1 𝐹𝐹̄ = 1.25𝐸𝐸̄ − 1.5(32 + 1.8𝐶𝐶̄ ) = 1.25𝐸𝐸̄ − 48 − 2.7𝐶𝐶̄ = 1.25(𝐸𝐸̄ − 2.16𝐶𝐶̄ ) − 48
= 1.25𝛽𝛽̂0 − 48 = 1.25(−4.32) − 48 = −53.4
Solution H.6.8
• 𝐸𝐸(𝑋𝑋) = 5 + 2 ∙ 0 = 5 and 𝐸𝐸(𝑈𝑈) = 0 , so 𝐸𝐸(𝑌𝑌) = 60 − 3 ∙ 5 + 0 = 45
• As 𝑋𝑋 and 𝑈𝑈 are independent, we have 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌) = (−3)2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈) = 9 ∙ 22 + 72 = 85
• 𝐸𝐸(𝑌𝑌|𝑋𝑋) = 60 − 3𝑋𝑋 + 𝐸𝐸(𝑈𝑈|𝑋𝑋) = 60 − 3𝑋𝑋 + 𝐸𝐸(𝑈𝑈) = 60 − 3𝑋𝑋
• 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌|𝑋𝑋) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈|𝑋𝑋) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑈𝑈) = 72 ∙ 1 = 49
Solution H.6.9
a) With the law of iterated expectations we have 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 𝐸𝐸�𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸(0) = 0.
Using this and using the law of iterated expectations once more we get:
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 , 𝑢𝑢𝑖𝑖 ) = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 ) − 𝐸𝐸(𝑋𝑋𝑖𝑖 ) 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 ) = 𝐸𝐸�𝐸𝐸(𝑋𝑋𝑖𝑖 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸�𝑋𝑋𝑖𝑖 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )� = 𝐸𝐸(0) = 0
𝑠𝑠𝑋𝑋𝑋𝑋 1 𝑠𝑠2
b) Applying LS to the second equation leads to 𝛾𝛾�1 = 2
𝑠𝑠𝑌𝑌
so �1
= 𝑠𝑠 𝑌𝑌 .
𝛾𝛾 𝑋𝑋𝑋𝑋
1 2
𝑠𝑠𝑌𝑌 𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) 𝜎𝜎𝑌𝑌2
Now �1
= �⎯� = . Note that due to the second LS assumption that (𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) are i.i.d for
𝛾𝛾 𝑠𝑠𝑋𝑋𝑋𝑋 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 .𝑌𝑌𝑖𝑖 ) 𝜎𝜎𝑋𝑋𝑋𝑋
𝑖𝑖 = 1, … , 𝑛𝑛 ., we know that 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) are constant.
In addition it follows that 𝑢𝑢𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 is also i.i.d. with constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ).
The first LS assumption says 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 so that 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 0 (see part a.), so that
𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) = 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) and
𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) = 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) + 2𝛽𝛽1 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑢𝑢𝑖𝑖 ) = 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 )
Therefore we get:
1 𝑠𝑠𝑦𝑦2 𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) 𝛽𝛽12 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) + 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 )
= → = = 𝛽𝛽1 + ≠ 𝛽𝛽1
𝛾𝛾�1 𝑠𝑠𝑥𝑥𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖 . 𝑌𝑌𝑖𝑖 ) 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 ) 𝛽𝛽1 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖 )
Solution H.6.10
a)
1 1 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
𝐸𝐸(𝑏𝑏1 ) = 𝐸𝐸�𝛽𝛽̂1 � + 𝐸𝐸 � � = 𝛽𝛽1 + and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑏𝑏1 ) = 𝑣𝑣𝑣𝑣𝑣𝑣�𝛽𝛽̂1 � =
𝑛𝑛 𝑛𝑛 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣( 𝑋𝑋𝑖𝑖 )]2
b) Clearly, 𝑏𝑏1 is not unbiased as
1
𝐸𝐸(𝑏𝑏1 ) = 𝛽𝛽1 + ≠ 𝛽𝛽1
𝑛𝑛
1 1 𝑣𝑣𝑣𝑣𝑣𝑣[(𝑋𝑋𝑖𝑖 −𝜇𝜇𝑋𝑋 )𝑢𝑢𝑖𝑖 ]
However, 𝑏𝑏1 is consistent, since 𝐸𝐸(𝑏𝑏1 ) = 𝛽𝛽1 + 𝑛𝑛 → 𝛽𝛽1 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑏𝑏1 ) = 𝑛𝑛 [𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋 )]2
→ 0 when
𝑖𝑖
𝑛𝑛 → ∞
28
Week 6
Tutorial exercises
Question T.6.1 (adapted from Exam 28-3-2008)
A company produces external disk drives. The management wants to design a regression model with
numbers of disk drives sold as the dependent variable, and its prices as explanatory variable. (Also, the
prices of competitors are recorded).
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 Number
observation Price
price of competitors sold
1 120 100 102
2 140 110 100
3 190 90 120
4 130 150 77
5 155 210 46
6 175 150 93
7 125 250 26
8 145 270 69
9 180 300 65
10 150 250 85
Correlations
PCOMP Price Number
PCOMP Pearson Correlation 1 ….. .318
Sig. (2-tailed) ….. .371
Sum of Squares and Cross-products 5190.000 820.000 1927.000
Covariance 576.667 91.111 214.111
N 10 10 10
Price Pearson Correlation ….. 1 -.726
Sig. (2-tailed) ….. .017
Sum of Squares and Cross-products 820.000 53760.000 -14164.000
Covariance 91.111 5973.333 -1573.778
N 10 10 10
Number Pearson Correlation .318 -.726 1
Sig. (2-tailed) .371 .017
Sum of Squares and Cross-products 1927.000 -14164.000 7076.100
Covariance 214.111 -1573.778 786.233
N 10 10 10
MODEL 1
Model Summaryb
1 .468 1.855
a. Predictors: (Constant), Price b. Dependent Variable: Number
29
ANOVAb
1 Regression
Residual
Total
a. Predictors: (Constant), Price
b. Dependent Variable: Number
a. What are the estimates for the constant and slope of the regression line in the simple regression
model, with the variable ‘Price’ as the explanatory variable?
b. What is the point prediction for the number of sold drives when the price equals 130?
c. Enter the missing values in the ANOVA table for MODEL 1.
d. Give the interpretation of 𝛽𝛽̂1 , 𝛽𝛽̂0 , 𝑌𝑌�, 𝑆𝑆𝑆𝑆𝑆𝑆, and 𝑅𝑅 2.
30
Exercise T.6.3 (midterm 28-9-2017, Q2ab modified)
In practice the researcher does not know the exact data generating process. Here you will consider a case
where you do know the exact data generating process, so that you can use this knowledge to evaluate the
quality of the least squares (LS) estimators.
Assume that 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 for 𝑖𝑖 = 1, … , 𝑛𝑛 where 𝛽𝛽0 and 𝛽𝛽1 are unknown parameters and each
pair (𝑋𝑋𝑖𝑖 , 𝑢𝑢𝑖𝑖 ) is drawn randomly from the following simultaneous distribution:
𝑢𝑢𝑖𝑖 = −2 𝑢𝑢𝑖𝑖 = −1 𝑢𝑢𝑖𝑖 = 0 𝑢𝑢𝑖𝑖 = 1 𝑢𝑢𝑖𝑖 = 2 total
𝑋𝑋𝑖𝑖 = 0 0.02 0.08 0.22 0.04 0.06 0.42
𝑋𝑋𝑖𝑖 = 1 0.08 0.10 0.24 0.10 0.06 0.58
total 0.10 0.18 0.46 0.14 0.12 1.00
a. Give the LS assumptions (Key Concept 4.3) of this model and check the validity of each assumption.
b. Suppose data of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are obtained for 𝑖𝑖 = 1, … , 𝑛𝑛. Is the LS-estimator of 𝛽𝛽1 unbiased?
31
Week 7
Homework exercises.
Exercise H.7.1 (continuation from H.6.1)
Referring to Exercise H.6.1 on page 19 and its Solution H.6.1 on page 24.
Test if there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌. Use 𝛼𝛼 = 0.10 and the
assumption of homoskedasticity.
Give (1) hypotheses, (2) test statistic and its distribution, (3) conditions, (4) rejection region, (5) outcome,
(6) confrontation and decision, (7) conclusion.
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.778623 2.226733 0.80 0.425 -2.596326 6.153572
_cons | 15.27447 11.90309 1.28 0.200 -8.111984 38.66093
------------------------------------------------------------------------------
a. Given the obtained regression, what is the error in the estimation of the slope?
32
b. Use the relevant confidence interval in the STATA output to test whether 𝐻𝐻0 ∶ 𝛽𝛽1 = 7 is rejected
in favour of 𝐻𝐻1 ∶ 𝛽𝛽1 < 7 (usual notation). Which significance level do you use?
c. When a new sample with the same number of observations is generated, what do you know about
the distribution of the resulting slope estimator?
33
Exercise H.7.9 (S&W 5.6)
In the 1980s, Tennessee conducted an experiment in which a large sample of kindergarten students were
randomly assigned to “regular” and “small” classes and given standardized tests at the end of the year.
(Regular classes contained approximately 24 students, and small classes contained approximately 15
students.) Suppose, in the population, the standardized tests have a mean score of 925 points and a
standard deviation of 75 points. Let SmallClass denote a binary variable equal to 1 if the student is
assigned to a small class and equal to 0 otherwise. A regression of TestScore on SmallClass yields
TestScore = 918.0 + 13.9 ∙ SmallClass, 𝑅𝑅 2 = 0.01, 𝑆𝑆𝑆𝑆𝑆𝑆 = 74.6
(1.6) (2.5) .
(between parentheses: heteroscedastic-robust standard errors)
a. Do you think that the regression errors are plausibly homoskedastic? Explain.
We can construct a 99% confidence interval for the effect of SmallClass on TestScore using
formula (5.3):
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = �𝜎𝜎�𝛽𝛽�2
1
b. Now, suppose the regression errors were homoskedastic. Would this affect the validity of the
confidence interval using the heteroscedastic-robust standard errors? Explain.
34
Week 7
Homework solutions
Solution H.7.1
From Solution H.6.1 on page 24: 𝑠𝑠𝑋𝑋2 = 5.952, 𝛽𝛽̂1 = 2.000, 𝑠𝑠𝑢𝑢� = 2.774
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• two extended least squares assumptions (‘Classical’ inference):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁 (𝑛𝑛 = 15, so the sample size is small)
Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 13] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝛼𝛼 = 0.10 two-tailed ⟹ 𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.05, 13 = 1.771 or 𝑇𝑇 ≤ −𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = −1.771
Sample outcome
𝛽𝛽̂1 − 𝛽𝛽1,0 2.000 − 0
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = = = 6.58
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � 2.774
�(15 1) ∙ 5.952
−
Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given a significance level of 10%, there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌.
Solution H.7.2
From Solution H.6.2 on page on page 25: 𝑠𝑠𝑋𝑋2 = 13.982, 𝛽𝛽̂1 = 50.529, 𝑠𝑠𝑢𝑢� = 76.19
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• two extended least squares assumptions (‘Classical’ inference):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁 (𝑛𝑛 = 8, so the sample size is small)
35
Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 > 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 6] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝛼𝛼 = 0.01 one-tailed ⟹ 𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.01, 6 = 3.143
Sample outcome
� − 𝛽𝛽
𝛽𝛽 50.529 − 0
1 1,0
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = = = 6.56
� �
𝑆𝑆𝑆𝑆�𝛽𝛽 76.19
1
�(8 1) ∙ 13.982
−
Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given the significance level of 1%, there is enough evidence to indicate that the number of ads has
a positive effect on the number of customers
p-value
Table, df = 6 : p-value = 𝑃𝑃(𝑇𝑇 ≥ 6.561) < 𝑃𝑃(𝑇𝑇 ≥ 3.707) = 0.005
Solution H.7.3
From Solution H.6.3 on page 26: 𝑠𝑠𝑋𝑋2 = 13641.307, 𝛽𝛽̂1 = 0.09094, 𝑠𝑠𝑢𝑢� = 10.5293.
a)
Conditions and assumptions
• model: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝑢𝑢 (𝑘𝑘 = 1)
• three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
• extended least squares assumption (𝑛𝑛 = 50, so the sample size appears to be sufficiently large):
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
Hypotheses
𝐻𝐻0 ∶ 𝛽𝛽1 = 0 versus 𝐻𝐻1 ∶ 𝛽𝛽1 ≠ 0
Test statistic and its distribution
𝛽𝛽̂1 − 𝛽𝛽1,0 𝑠𝑠𝑢𝑢�
𝑇𝑇 = ~ 𝑡𝑡[df = 𝑛𝑛 − 2 = 48 ≈ 50] where 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = (homoskedastic)
̂
𝑆𝑆𝑆𝑆�𝛽𝛽1 � �(𝑛𝑛 − 1)𝑠𝑠𝑋𝑋2
Rejection region
𝑇𝑇 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.025, 50 = 2.009 or 𝑇𝑇 ≤ −𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = −2.009
Sample outcome
𝛽𝛽̂1 − 𝛽𝛽1,0 0.09094 − 0
𝑇𝑇 = = = 7.061
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � 10.5293
�(50 − 1) ∙ 13641.307
36
Confrontation and decision
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ≥ 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ⟹ reject 𝐻𝐻0
Conclusion
Given a significance level of 5%, there is sufficient evidence to conclude that EXERCISE has an
effect on RED
p-value
p-value = 2 ∙ 𝑃𝑃(𝑇𝑇 ≥ 𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = 7.062) < 2 ∙ 𝑃𝑃(𝑇𝑇 ≥ 2.678) = 2 ∙ 0.005 = 0.01
b)
𝑠𝑠𝑢𝑢� 10.5293
𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � = = = 0.01288
�(𝑛𝑛 − 1)𝑠𝑠𝑥𝑥2 �(50 − 1) ∙ 13641.307
The 99% confidence interval for 𝛽𝛽1 :
�𝛽𝛽̂1 − 𝑡𝑡0.005,50 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 �, 𝛽𝛽̂1 + 𝑡𝑡0.005,50 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂1 � �
= �0.09094 − 2.678 ∙ 0.01288, 0.09094 + 2.678 ∙ 0.01288�
= [0.056, 0.125]
c) The true value of 𝛽𝛽1 lies between 0.056 and 0.125 in 99% of all possible samples.
(Or: with 99% confidence, the true value of 𝛽𝛽1 lies between 0.056 and 0.125.
Solution H.7.4
a. The following line of STATA code: generate y=4+3*x+100*rnormal() implies that 𝛽𝛽1 = 3.
Therefore, the estimation error equals
𝛽𝛽̂1 − 𝛽𝛽1 = 1.778623 − 3 = −1.221377
b. The 95% confidence interval for 𝛽𝛽1 in the STATA output is [−2.596326, 6.153572 ], which means
that it lies completely below the test value of 7. Hence, there is sufficient statistical evidence to infer
that 𝛽𝛽1 is smaller than 7.
Because the two-sided confidence interval is used to test a one-sided alternative hypothesis, there is
only a one-directional risk, and thus we are testing with a significance level of 5%/2 = 2.5%.
c. In the simulation all Gauss-Markov assumptions are satisfied (Key Concept 5.5) and 𝑢𝑢𝑖𝑖 ~ 𝑁𝑁(0, 1002 ).
Hence, 𝛽𝛽̂1 ~ 𝑁𝑁(3, 𝜎𝜎𝛽𝛽�2 ) with (homoskedasticity only)
1
Solution H.7.5
a. Under the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and assuming that the sample size 𝑛𝑛 is large (‘Modern’ inference).
Under these conditions, the LS estimators are unbiased and consistent, and the 𝑡𝑡-statistics have
(approximately) a standard normal distribution. There is no result on efficiency.
b. Under the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
37
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and two extended LS assumptions:
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
(5) normally distributed: 𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ~ 𝑁𝑁
When these 5 LS assumptions hold, the LS estimators have an exact normal sampling distribution, and
the homoskedasticity-only 𝑡𝑡-statistics have an exact Student 𝑡𝑡 distribution. Furthermore, the first 4
assumptions imply that LS estimators are Best Linear conditionally Unbiased Estimators (BLUE), so also
efficient.
c. If the three LS assumptions (Key Concept 4.3):
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 for all 𝑖𝑖 = 1, … , 𝑛𝑛
(2) (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are i. i. d. for 𝑖𝑖 = 1, … , 𝑛𝑛
(3) Large outliers of (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) are unlikely for all 𝑖𝑖 = 1, … , 𝑛𝑛
and one extended LS assumption:
(4) homoskedasticity: constant 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 )
hold, the LS estimators are Best Linear conditionally Unbiased Estimators (BLUE). In other words, the
Gauss Markov theorem states that the LS estimators have the smallest conditional variance of all linear
conditionally unbiased estimators.
Solution H.7.6
a. 𝛽𝛽̂0 ± 𝑡𝑡0.025, 28 ∙ 𝑆𝑆𝑆𝑆�𝛽𝛽̂0 � = 43.2 ± 2.048 ∙ 10.2 = 43.2 ± 20.89 gives (22.31, 64.09).
�1 −𝛽𝛽1,0
𝛽𝛽 61.5−55
b. The t-statistic is 𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 = �1 �
𝑆𝑆𝑆𝑆�𝛽𝛽
= = 0.878 which is less (in absolute value) than the critical
7.4
value of 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡0.025, 28 = 2.048. Thus, the null hypothesis is not rejected at the 5% level.
c. The one sided 5% critical value is 𝑡𝑡0.05, 28 = 1.701.
𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 is less than this critical value, so that the null hypothesis is not rejected at the 5% level.
Solution H.7.7
a. False. The unbiasedness of the OLS estimator has nothing to do with heteroscedasticity. In fact, to show
unbiasedness we only needed the LS Assumptions listed in Key Concept 4.3.
b. True. If one uses homoscedasticity-only standard errors, then ignoring the nonconstant variance of the
error term leads to incorrect standard errors for the OLS estimators. The use of robust standard errors
acknowledges the presence of heteroscedasticity.
c. False. Note that the 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 is the smallest significance level for which one could reject the null
assuming the null hypothesis is true. The 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 is larger than the significance level so we could not
reject at the 1% level.
Solution H.7.8
a. 𝑚𝑚𝑚𝑚𝑚𝑚{𝛼𝛼�} ∑𝑛𝑛𝑖𝑖=1(𝑌𝑌𝑖𝑖 − 𝛼𝛼�)2 so −2 ∑𝑛𝑛𝑖𝑖=1(𝑌𝑌𝑖𝑖 − 𝛼𝛼�) = 0 ⟹ 𝛼𝛼� = 𝑌𝑌�
b. Of course, we consider the population variance:
𝑛𝑛 𝑛𝑛 𝑛𝑛
1 1 1
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌̄) = 𝑣𝑣𝑣𝑣𝑣𝑣 � � 𝑌𝑌𝑖𝑖 � = 2 𝑣𝑣𝑣𝑣𝑣𝑣 �� 𝑌𝑌𝑖𝑖 � = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 )
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
Now we make use of the assumption that 𝑌𝑌𝑖𝑖 is i.i.d. (LS-ass. # 2) and that 𝑌𝑌𝑖𝑖 = 𝛼𝛼 + 𝑢𝑢𝑖𝑖 , then
𝑛𝑛 𝑛𝑛
1 1
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼 + 𝑢𝑢𝑖𝑖 ) = 2 � 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) since 𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼) = 0
𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
38
As we assumed that 𝑌𝑌𝑖𝑖 is identically distributed, 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑌𝑌𝑖𝑖 ) is constant, say 𝜎𝜎 2 . (note: as
there is no independent variable 𝑋𝑋𝑖𝑖 we do not work with the conditional variance 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) ) .
Hence,
𝑛𝑛
1 𝜎𝜎 2
𝑣𝑣𝑣𝑣𝑣𝑣(𝛼𝛼�) = 2 � 𝜎𝜎 2 =
𝑛𝑛 𝑛𝑛
𝑖𝑖=1
which is the population variance
Solution H.7.9
a. The question asks whether the variability in test scores in large classes is the same as the variability in
small classes. It is hard to say. On the one hand, teachers in small classes might able so spend more
time bringing all of the students along, reducing the poor performance of particularly unprepared
students. On the other hand, most of the variability in test scores might be beyond the control of the
teacher.
b. Formula (5.3) is valid for heteroskesdasticity or homoskedasticity; thus inferences are valid in either
case.
39
Week 7
Tutorial exercises
Question T.7.1 (S&W 5.2)
Suppose that a researcher, using wage data on 200 randomly selected male workers and 240 female
workers, estimates the OLS regression
� = 10.73 + 1.78 ∙ Male, 𝑅𝑅 2 = 0.09, 𝑆𝑆𝑆𝑆𝑆𝑆 = 3.8
Wage
(0.16) (0.29) .
(between parentheses: heteroscedastic-robust standard errors)
where Wage is measured in dollars per hour, and Male is a binary variable that is equal to 1 if the person
is a male and 0 if the person is a female. Define the wage gender gap as the difference in mean earnings
between men and women.
a. What is the estimated gender gap?
b. Is the estimated gender gap significantly different from 0?
(Compute the 𝑝𝑝-𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 for testing the null hypothesis that there is no gender gap.)
c. Construct a 95% confidence interval for the gender gap.
d. In the sample, what is the mean wage of women? Of men?
e. Another researcher uses these same data but regresses Wage on Female, a variable that is equal
to 1 if the person is female and 0 if the person a male. What are the regression estimates calculated
from this regression?
� = ______ + ______ ∙ 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝑅𝑅 2 = ______, 𝑆𝑆𝑆𝑆𝑆𝑆 = ______
𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊
40
Question T.7.3..(S&W 5.7, modified)
Suppose (𝑋𝑋𝑖𝑖 , 𝑌𝑌𝑖𝑖 ) satisfy the three least squares assumptions in Key Concept 4.3. A random sample of size
𝑛𝑛 = 250 is drawn and yields
� = 5.4 + 3.2 X, 𝑅𝑅 2 = 0.26, 𝑆𝑆𝑆𝑆𝑆𝑆 = 6.2
Y
(3.1) (1.5) .
(between parentheses: homoskedasticity only standard errors)
a) Test if there is sufficient evidence to conclude that 𝑋𝑋 has an effect on 𝑌𝑌. Use 𝛼𝛼 = 0.05 and the
assumption of homoskedasticity.
b) Construct a 95% confidence interval for 𝛽𝛽1
c) Suppose you learned that 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 were independent. Would you be surprised? Explain.
d) Suppose 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 are independent and many samples of size 𝑛𝑛 = 250 are drawn, the model is
estimated for each sample and and part a and b are answered. In what fraction of the samples
would the null hypothesis from part a be rejected? In what fraction of samples would the value
𝛽𝛽1 = 0 be included in the confidence interval from part b?
41