Professional Documents
Culture Documents
STAT2 Resit Exam 2023-01-12
STAT2 Resit Exam 2023-01-12
For each test, we have drawn random samples: either two independent samples 𝑋𝑋 and 𝑌𝑌, or one
sample of matched pairs (𝑋𝑋, 𝑌𝑌) – whatever is appropriate for the specified test.
Instructions
The tables below show the observed values (the samples) and a lot of empty cells. In these empty
cells, you must write the appropriate rank numbers associated with the observed values. There may
be more cells than strictly necessary; these may remain empty or may be used for something else.
Example
𝑋𝑋 2 3 5 7 11 13 17 19
rank 1 2 3 4 5 6 7 8
Determine the ranks as used by the Wilcoxon Signed Rank Sum Test of this table:
𝑋𝑋 10 6 11 11 1 4 6 3
𝑌𝑌 8 9 5 1 7 2 3 5
(It might be possible that you did not need to use all empty cells)
Now, calculate the Wilcoxon Signed Rank Sum. Write the calculation in the answer box: [ANSWER
BOX]
𝑌𝑌 8 9 5 1 7 2 3 5
difference
X–Y 2 –3 6 10 –6 2 3 –2
rank 2 4.5 6.5 8 6.5 2 4.5 2
The Wilcoxon Signed Rank Sum: 𝑇𝑇+ = 2 + 6.5 + 8 + 2 + 4.5 = 23
or: 𝑇𝑇− = 4.5 + 6.5 + 2 = 13
1b. Spearman’s Rank Correlation Coefficient
Determine the ranks of the observed values below, and write them in empty cells of this table:
𝑋𝑋 7 1 12 8 3 12 5 7
𝑌𝑌 C G E H E A B İ
(It might be possible that you did not need to use all empty cells)
Now, calculate Spearman’s Rank Correlation coefficient (2 decimals). Write a short calculation in the
answer box: [ANSWER BOX]
𝑠𝑠 −1.1785
The Spearman’s Rank Correlation coefficient: 𝑟𝑟𝑠𝑠 = 𝑠𝑠 𝑋𝑋𝑋𝑋
𝑠𝑠
= = −0.20
𝑋𝑋 𝑌𝑌 √5.8571 ∙ √5.9286
Determine the ranks of the observed values below, and write them in empty cells of this table:
𝑋𝑋 J F K K A D F C
𝑌𝑌 H İ E A G B C E
(It might be possible that you did not need to use all empty cells)
Now, calculate the Wilcoxon Rank Sum as used in a test. Write the calculation in the answer box:
[ANSWER BOX]
The Wilcoxon Rank Sum: 𝑇𝑇1 = 14 + 9.5 + 15.5 + 15.5 + 1.5 + 6 + 9.5 + 4.5 = 76
or: 𝑇𝑇2 = 12 + 13 + 7.5 + 1.5 + 11 + 3 + 4.5 + 7.5 = 60
Question 2 – Dependence / independence (18)
(this is exercise 10, from the homework exercises of week 3, in Syllabus A – weeks 1-2-3)
New drugs are usually tested by giving a randomly selected group of people the drug, and another
randomly selected group of people (named the control group) a placebo. Each person is then asked
whether she or he suffered serious side effects. Suppose that for a new drug, the following data
were collected:
serious side effects
suffered did not suffer
new drug 41 165
medication
placebo 28 161
Use a 𝜒𝜒 2 -test to determine if we conclude that, at the 10% significance level, differences exists
between the new drug and the placebo in terms of reported side effects. Follow these steps:
[1] Assumptions and conditions, [2] Hypotheses, [3] Test statistic and its distribution, [4] Rejection
region, [5] Sample outcome, [6] Confrontation and decision, [7] Conclusion
Answer 2
[1] Assumptions and conditions
• Radom sample
• Available: nominal data (two yes/no-variables)
• Required: minimally nominal data
• All 𝑒𝑒𝑖𝑖 ≥ 5
[2] Hypotheses
𝐻𝐻0 ∶ the two classifications are independent
𝐻𝐻1 ∶ the two classifications are dependent
[7] Conclusion
Given the significance level of 10%, there is not sufficient evidence to infer that there exists
differences in side effects between the new drug and the placebo.
Question 3 – Confidence interval (9)
A statistician uses two random samples of clothes hangers, in order to test the maximum weight
(kilograms) that two different brands of clothes hangers can carry. The results:
brand 1 brand 2
sample size 18 16
mean max.weight 10.4 12.2
st.deviation max.weight 7.2 4.4
𝑠𝑠 2
… …
𝑛𝑛
Use a 90%-confidence interval to estimate the difference between mean maximum weights.
(3 decimals)
Answer 3
𝑠𝑠 2 𝑠𝑠12 𝑠𝑠22
𝑛𝑛1
= 2.88 = 1.21
𝑛𝑛 𝑛𝑛2
2
𝑠𝑠 2 𝑠𝑠 2
�𝑛𝑛1 + 𝑛𝑛2 � (2.88 + 1.21)2
1 2
df = = = 28.57 ≈ 29
𝑠𝑠 2 2
𝑠𝑠 2
2 2.882 1.212
� 1� � 2� +
18 − 1 16 − 1
𝑛𝑛1 𝑛𝑛2
𝑛𝑛1 − 1 + 𝑛𝑛2 − 1
1
1 − 𝛼𝛼 = 90% ⟹ 2
𝛼𝛼 = 5%
𝑠𝑠12 𝑠𝑠22
𝜇𝜇1 − 𝜇𝜇2 = 𝑥𝑥̅1 − 𝑥𝑥̅2 ± 𝑡𝑡𝛼𝛼, df ∙ � +
2 𝑛𝑛1 𝑛𝑛2
= 10.4 − 12.2 ± 1.699 ∙ √2.88 + 1.21 = −1.8 ± 3.436
⟹ −5.236 < 𝜇𝜇1 − 𝜇𝜇2 < 1.636
Question 4 – Regression I (2 + 2 + 5 + 7 + 4 + 2 + 7 = 29)
In a number of cities in the world, the cost of living has been measured yearly as an index. This index
expresses the cost of living in a city as a percentage of the cost of living in New York City. Thus for
New York City the value of the index is 100 (representing 100%).
rank2007 city index2007 index2006
1 Moscow 134.4 123.9
2 London 126.3 110.6
3 Seoul 122.4 121.7
4 Tokyo 122.1 119.1
5 Hong Kong 119.4 116.3
6 Copenhagen 110.2 101.1
7 Geneva 109.8 103
8 Osaka 108.4 108.3
9 Zurich 107.6 100.8
10 Oslo 105.8 100
11 Milan 104.4 96.9
12 St Petersburg 103 99.7
13 Paris 101.4 93.1
14 Singapore 100.4 92
15 New York 100 100
------------------------------------------------------------------------------
| Robust
index2007 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
index2006 | .94254 .1069 8.82 0.000 .71161 1.1735
_cons | 12.017 10.80 1.11 0.286 -11.312 35.346
------------------------------------------------------------------------------
It is assumed that a random sample (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ) for 𝑖𝑖 = 1, … , 𝑛𝑛 is drawn, and that (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ) has finite
fourth moments. It is further assumed that 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2 , and that 𝑢𝑢𝑖𝑖 is
distributed independently of 𝑋𝑋𝑖𝑖 .
It is also known that 𝑋𝑋𝑖𝑖 > 0 is a random variable that by definition must always be positive, which
implies that 𝐸𝐸(𝑋𝑋𝑖𝑖 ) = µ𝑥𝑥 > 0; and that its effect on 𝑌𝑌𝑖𝑖 is strictly positive, so 𝛽𝛽1 > 0. Suppose that
an econometrician is interested in estimating the constant 𝛽𝛽0 and considers to use the simple
sample mean 𝑏𝑏0 = 𝑌𝑌� as an estimator of the constant 𝛽𝛽0 .
5a. It follows from the assumptions that 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝜎𝜎 2 . Explain why.
Since 𝑢𝑢𝑖𝑖 is distributed independently of 𝑋𝑋𝑖𝑖 , it follows that the conditional distribution of 𝑢𝑢𝑖𝑖
for given 𝑋𝑋𝑖𝑖 is the same as the unconditional distribution of 𝑢𝑢𝑖𝑖 and this means in particular
that the conditional expected value and variance are equal to the unconditional expected value
and variance, so 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0 and 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 ) = 𝜎𝜎 2
5b. Derive 𝐸𝐸(𝑏𝑏0 |𝑋𝑋1 , . . . , 𝑋𝑋𝑛𝑛 ). Next, derive 𝐸𝐸(𝑏𝑏0 ) and show that it is unequal to 𝛽𝛽0 .
𝐸𝐸(𝑏𝑏0 | 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 ) = 𝐸𝐸(𝑌𝑌� | 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 )
1
= 𝐸𝐸 � ∑𝑛𝑛𝑖𝑖=1 𝑌𝑌𝑖𝑖 � 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 �
𝑛𝑛
1
= 𝐸𝐸 �𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑢𝑢𝑖𝑖 ) � 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 �
𝑛𝑛
1
= �(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝐸𝐸(𝑢𝑢𝑖𝑖 | 𝑋𝑋1 , . . . , 𝑋𝑋𝑛𝑛 ) )
𝑛𝑛
𝑖𝑖=1
𝑛𝑛
1
= �(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋i ) ) as 𝑋𝑋𝑖𝑖 is independent of 𝑋𝑋𝑗𝑗 with 𝑗𝑗 ≠ 𝑖𝑖
𝑛𝑛
𝑖𝑖=1
𝑛𝑛
1
= �(𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 0)
𝑛𝑛
𝑖𝑖=1
= 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋̄
So: 𝐸𝐸(𝑏𝑏0 ) = 𝐸𝐸[𝐸𝐸(𝑏𝑏0 |𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 )] = 𝐸𝐸[𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋̄] = 𝛽𝛽0 + 𝛽𝛽1 ∙ 𝐸𝐸[𝑋𝑋̄] = 𝛽𝛽0 + 𝛽𝛽1 𝜇𝜇𝑋𝑋 ≠ 𝛽𝛽0
5c. Based on the result in part b, can you tell whether 𝑏𝑏0 is biased, and whether 𝑏𝑏0 is consistent?
Explain exactly what you check for.
𝐸𝐸(𝑏𝑏0 ) = 𝛽𝛽0 + 𝛽𝛽1 𝜇𝜇𝑋𝑋 > 𝛽𝛽0 (as 𝛽𝛽1 > 0 and 𝜇𝜇𝑋𝑋 > 0), so 𝑏𝑏0 has a positive bias.
𝐸𝐸(𝑏𝑏0 ) = 𝛽𝛽0 + 𝛽𝛽1 𝜇𝜇𝑋𝑋 is constant in 𝑛𝑛, so it does not converge to 𝛽𝛽0 when 𝑛𝑛 increases to
infinity. This means that 𝑏𝑏0 has an asymptotic bias, so that it cannot be consistent.
5d. Is the least-squares estimator of 𝛽𝛽0 the BLUE? Explain.
The LS-assumptions are satisfied, as:
(1) 𝐸𝐸(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 0 is assumed
(2) (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ) for 𝑖𝑖 = 1, … , 𝑛𝑛 is a random sample, so they are i.i.d.
(3) (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ) has finite fourth moments.
In addition 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢𝑖𝑖 |𝑋𝑋𝑖𝑖 ) = 𝜎𝜎 2 is constant (homoskedasticity).
Under these conditions the LS-estimator of 𝛽𝛽0 is BLUE.
~ The End ~