Professional Documents
Culture Documents
University of Toronto Scarborough Department of Computer and Mathematical Sciences Final Exam, Winter - 2015
University of Toronto Scarborough Department of Computer and Mathematical Sciences Final Exam, Winter - 2015
TUTORIAL:
Aids Allowed:
• A handwritten cheat-sheet covering both sides of TWO A4/letter sized papers. You
need to submit your cheat-sheet with your answer-sheet after the exam.
All your work must be presented clearly in order to get credit. Answer alone (even though
correct) will only qualify for ZERO credit. Please show your work in the space provided;
you may use the back of the pages, if necessary, but you MUST remain organized. Show
your work and answer in the space provided.
There are 13 pages including this page. Please check to see you have all the
pages.
Good Luck!
Question 1 2 3 4 5 6 7 8 9 Total
Points 10 10 10 10 14 10 10 10 16 100
Score
1
1 (a): Given that the survival times (X) in months of a group of patients with lung cancer
follow exponential distribution with probability density function
where λ = 0.07. You want to predict a future value of X using its median. Hence, find
the median survival time of the patients with lung cancer.
[5 Points]
Z M Z M
fX (x)dx = λe−λx dx = 1 − e−λM = 0.5.
0 0
[3 Points]
log(0.5) log(0.5)
M =− =− = 9.902103.
λ 0.07
[2 Points]
1 (b): You are also interested in predicting a future value of X in the form of an interval.
Find the 95% smallest interval of the survival times of the patients with lung cancer.
[5 Points]
Solution: The mass of the density function is heavy on the left side of the curve.
Hence, the smallest interval of X with mass 0.95 is (0, b), where
Z b Z b
fX (x)dx = λe−λx dx = 1 − e−λb = 0.95.
0 0
[3 Points]
log(0.95) log(0.95)
b=− =− = 42.79618.
λ 0.07
[2 Points]
This tells us that 95% of the lung cancer patients will fail within 42.79618 months.
2
2 (a): A random variable Y is defined as
1 if the grade of a student is A+
Y =
0 Otherwise,
[5 Points]
where the parameter is θ and the parameter space is Ω = [0, 1]. [3 Points]
2 (b): Suppose that a statistical model is given by the family of Bernoulli(θ) distributions
where θ ∈ Ω = [0, 1]. If your interest is in making inferences about the probability
that two independent observations from this model are NOT the same, then determine
ψ(θ).
[5 Points]
ψ(θ) = P (Y1 6= Y2 ).
[2 Points]
That is
ψ(θ) = P (Y1 = 1, Y2 = 0) + P (Y1 = 0, Y2 = 1).
[1 Points]
Finally,
[2 Points]
3
3 (a): Suppose the following sample of waiting times (in minutes) was obtained for customers
in a queue at an automatic banking machine. The median, first and third quartiles are
15 10 2 3 1 0 4 5
5 3 3 4 2 1 4 55
computed as Q2 = 3.5, Q1 = 2.0 and Q3 = 5.0, respectively. Use the 1.5 × IQR rule
to find out any outliers in the data. Show your work clearly.
[5 Points]
[1 Points]
[2 Points]
In this problem, no sample observation is less than −2.5. On the other hand, the first
(15), second (10), and the last (55) sample observations are greater than 6.5. According
to the 1.5 × IQR rule the first (15), second (10), and the last (55) sample observations
are potential outliers.
[2 Points]
3 (b): Suppose that a statistical model is given by the family of N (µ, σ02 ) distributions where
θ = µ ∈ R1 is unknown, while σ02 is known. If your interest is in making inferences
about the third quartile of the true distribution, then determine ψ(θ).
[5 Points]
Solution: Let the random variable is X which follows N (µ, σ02 ) distribution, and x0.75
be the third quartile. Then
X −µ x0.75 − µ x0.75 − µ
P (X ≤ x0.75 ) = P ≤ =P Z≤ = 0.75,
σ0 σ0 σ0
where Z ∼ N (0, 1). [2 Points]
We write
x0.75 − µ
= z0.75 ,
σ0
where Φ(z0.75 ) = 0.75, and Φ is the cumulative distribution function of N (0, 1) distri-
bution.
[2 Points]
Finally,
ψ(θ) = x0.75 = µ + σ0 z0.75 ,
4
4 (a): Let (x1 , x2 , · · · , xn ) be an observed sample from a distribution with probability density
function
1 x θ1 −1 0 < x ≤ 1
θ
fX (x) =
0 Otherwise,
where θ > 0 is an unknown parameter. Find the sufficient statistic and MLE of θ. Is
the sufficient statistic minimal sufficient?
[7 Points]
[2 Points]
[4 Points]
Qn
Here, θ̂ can be expressed as a function of T = i=1 xi . Hence, T is minimal sufficient
for θ.
[1 Points]
[3 Points]
5
5 (a): A cancer laboratory is estimating the rate of tumorigenesis in a group of mice. They
have tumor count data for 10 mice given in the following table. Information from other
12 9 12 14 13
13 15 8 15 6
laboratories suggests that the mice have tumor counts that are approximately Poisson
distributed with a mean of λ. Given the following prior λ ∼ Gamma(120, 10), find the
posterior distribution of λ.
[8 Points]
Solution: Given that the variable tumor counts, X ∼ Poisson(λ), we write the joint
distribution of the sample as
n
Y e−nλ λnx̄
Pλ (x1 , · · · , xn ) = Pλ (xi ) = .
i=1
x1 ! · · · xn !
[2 Points]
[3 Points]
We write
λ|x1 , · · · , xn ∼ Gamma(nx̄ + α, n + β),
[2 Points]
nx̄ + α 237
E(λ|x1 , · · · , xn ) = = = 11.85,
n+β 20
[3 Points]
and
nx̄ + α 237
Var(λ|x1 , · · · , xn ) = 2
= 2 = 0.5925,
(n + β) 20
respectively. [3 Points]
6
6 (a): Let (x1 , x2 , · · · , xn1 ) be a random sample from a location normal model N (µ, σ02 ),
where the unknown µ ∈ R1 and the known σ02 > 0. You are interested in computing
a γ-confidence interval for µ. Consider that the length of your confidence interval
is |CI(n1 )|. Now you draw another random sample of size n2 (> n1 ), and obtain a
γ-confidence interval with length |CI(n2 )|. Hence, the following relationship holds
[6 Points]
Hence,
σ0
|CI(n1 )| = 2z 1+γ √ .
2 n1
[1 Points]
Similarly,
σ0
|CI(n2 )| = 2z 1+γ √ .
2 n2
[1 Points]
Since n2 > n1 ,
|CI(n1 )| > |CI(n2 )| .
Hence, the stated relationship does not hold. The correct answer is FALSE.
[2 Points]
6 (b): You want the maximum length of the confidence interval to be |CI|. Find the required
sample size n.
[4 Points]
Solution: The maximum length of the γ-confidence interval for µ is |CI|, i.e.,
σ0
2z 1+γ √ ≤ |CI|.
2 n
[2 Points]
2
σ0
n ≥ 2z 1+γ .
2 |CI|
[2 Points]
7
7 (a): Suppose in the population of students of the course STAB57H3, the mark, out of 100,
in the midterm test is approximately distributed as N (µ, 202 ). A random sample of
marks of 10 students is given in the following table. Test whether the distributional
assumption is reasonable or not.
14 55 86 35 59
78 41 28 32 24
[5 Points]
The sample size and mean are n = 10 and x̄ = 45.2. The residual vector is computed
as
r = (r1 = −31.2, r2 = 9.8, · · · , r10 = −21.2).
P10 2
Here, i=1 ri = 5041.6, and the discrepancy statistic is
10
X
2
χ (r) = ri2 /σ02 = 5041.6/202 = 12.604.
i=1
[3 Points]
[2 Points]
7 (b): Consider you place a prior µ ∼ N (70, 42 ) on the unknown parameter. Construct a
0.95-credible interval for µ.
[5 Points]
Solution: Given X ∼ N (µ, σ02 = 202 ), µ ∼ N (µ0 = 70, τ02 = 42 ), γ = 0.95, z 1+γ =
2
8
8 (a): Let (X1 , X2 , · · · , X10 ) be a random sample of size 10 from the Bernoulli(θ) distribution.
Find the value of k such that k X̄(1 − X̄) is an unbiased estimator of ψ(θ) = θ(1 − θ).
[5 Points]
Solution: Here,
[2 Points]
n−1
E(X̄(1 − X̄)) = θ(1 − θ)
n
[2 Points]
Hence,
n
E( X̄(1 − X̄)) = θ(1 − θ) = ψ(θ).
n−1
So,
n 10
k= = .
n−1 9
[1 Points]
8 (b): Suppose that X, Y, Z are independent N (0, 1) random variables and that U = X + Z,
V = Y + Z. Determine whether or not the variables U and V are related to each other.
[5 Points]
[3 Points]
The variance of U is
The variance of V is
[1 Points]
Hence, the two variables U and V are related to each other. [1 Points]
9
9 (a): Shown below are the number of galleys for a manuscript (X) and the dollar cost
of correcting typographical errors (Y ) in a random sample of recent orders handled
by a firm specializing in technical manuscripts. Assume that the regression model
Yi = β0 + β1 Xi + i is appropriate, with normally distributed independent error terms
with variance σ 2 .
i: 1 2 3 4 5 6
Xi : 7 12 4 14 25 30
Yi : 128 213 75 250 446 540
The following numbers are provided: 6i=1 Xi = 92, 6i=1 Yi = 1652, 6i=1 Xi2 = 1930,
P P P
P6 2
P6
i=1 Yi = 620394, i=1 Xi Yi = 34602. Obtain the least squares estimate of β0 .
[5 Points]
[2 Points]
[2 Points]
[1 Points]
10
9 (b): Given that an unbiased estimate of the error variance σ 2 as
M SE = 7.001,
[5 Points]
X̄ 2 (92/6)2
2 1 1
s {b0 } = M SE +P = 7.001 + 2
= 2.08282 .
n (Xi − X̄)2 6 1930 − 6(92/6)
[2 Points]
[2 Points]
[1 Points]
9 (c): The analysis of variance (ANOVA) table for the fitted regression model is given below.
Test the null hypothesis that H0 : β1 = 0 against alternative HA : β1 6= 0.
[3 Points]
Solution: Here, the test statistic is F = 23632.04 with P-value 0.0. Hence, the null
hypothesis H0 : β1 = 0 is rejected at 0.05/0.01 level of significance.
11
9 (d): Estimate the coefficient of determination R2 . Interpret the result.
[3 Points]
SSR 165515.32
R2 = = = 0.9998.
SST 165543.34
[2 Points]
Interpretation: The fitted regression model can explain 99.98% variability of the
total variation in the response variable Y .
[1 Points]
12
Appendix
e−λ λx
P (X = x|λ) = x = 0, 1, 2, 3, · · ·
x!
β α α−1 −βx
f (x|α, β) = x e ,
Γ(α)
where α > 0, β > 0, and x ∈ R+ .
z0.975 = 1.959964.
|INT| = b − a.
13