Technical University of Denmark

Written examination: 14th of December 2013

Course name and number: Introduction to Statistics, 02402

Exercise I

A box contains 6 notes:

On 1 of the notes there is the number 1

On 2 of the notes there is the number 2
On 2 of the notes there is the number 3
On 1 of the notes there is the number 4

Two notes are drawn at random from the box, and the following random variable is introduced:
X, which describes the number of notes with the number 4 among the 2 drawn. The two notes
are drawn without replacement.

Question I.1 (1) The mean and variance for X, and P (X = 0) become:


X follows the hypegeometric distribution with N = 6, a = 1, and n = 2 so the mean and

variance formula for this distribution is used to find:
µx = n = 2/6
a a N −n
σx2 = n (1 − )( ) = (2/6)(5/6)(4/5) = 8/36 = 2/9
N N N −1

And the hypergeometric probability formula gives:
1 5
0 2 5·4·2
P (X = 0) =   = = 2/3
6 2·6·5

So the correct answer is:

1  µx = 1/3, σx2 = 2/9 and P (X = 0) = 2/3

Question I.2 (2) The 2 notes are now drawn with replacement. What is the probability that
none of the 2 notes has the number 1 on it?


The binomial probability rule is used:

P (X = 0) = (5/6)2 = 25/36

So the correct answer is:

3  25/36

Exercise II

The staffing for answering calls in a company is based on that there will be 180 phone calls
per hour randomly distributed. In 9 periods of 5 minutes, the following number of calls (per 5
minutes) are registered:

15 20 26 11 16 29 22 12 19

If there are 20 calls or more in a period of 5 minutes the capacity is exceeded, and there will
be an unwanted waiting time, hence there is a capacity of 19 calls per 5 minutes.

Question II.1 (3) If the usual poisson model for such data is used, one can based on the
registrations find a P-value for the hypothesis:

H0 : µ = 180

H1 : µ > 180
where µ is the mean for the number of calls per hour, as:

Answer: The P-value is ”the probability of the observations gven the null hypothesis”. Under
the null, the average number of calls during 45 minutes (the 9 periods of 5 minutes) is: λ45min =
0.75 · 180 = 135, and the observed number of calls during the 45minutes is 170 (the sum of the
9 observations given). So the correct answer is:

5  P (X ≥ 170), X ∼ P o(λ), λ = 135

Question II.2 (4) The probability that the capacity is exceeded in a random period of 5
minutes will be:

Answer: The 60min mean of 180 calls corresponds to a 5min mean of µ5min = 180/12 = 15 and
the event of exceeding capacity is the event of observing at least 20 calls within 5 minutes. So
the correct answer is:

1  P (X ≥ 20) = 0.125, where X ∼ P o(15)

Question II.3 (5)

If the probability should be at least 99% that all calls will be handled without waiting time for
a randomly selected period of 5 minutes, how large should the capacity per 5 minutes then at
least be?


It is required that

P (All calls will be handled) = P (X ≤ Capacity) ≥ 0.99

where X ∼ P o(15). Looking in a poisson table or using R:

> ppois(22:26,15)
[1] 0.9672558 0.9805354 0.9888352 0.9938151 0.9966881

shows that the first (smallest) capacity level achieving this is 25. So the correct answer is:

4  The capacity must be at least 25 per 5 minutes

Exercise III

The length of an aluminum profile is checked by taking a sample of 16 items whose length is
measured. The measurement results from this sample are listed below, all measurements are
in mm:
180.02 180.00 180.01 179.97 179.92 180.05 179.94 180.10
180.24 180.12 180.13 180.22 179.96 180.10 179.96 180.06
From data is obtained: x̄ = 180.05 and s = 0.0959

The requirements for the length of the profile are:

µL = 180mm and σL = 0.08mm

Question III.1 (6) If the following hypothesis test is carried out:

H0 : σ 2 = 0.082
H1 : σ 2 > 0.082
the following P-value and conclusion are obtained, if α = 5% is used:

Answer: The usual statistic for this test is:

15 · 0.09592
χ2 = = 21.6
The critical value for the test is χ20.05 with ν = 15 degrees of freedom, so χ20.05 (15) = 24.996
(from a table or in R: qchisq(0.95,15). This means that we cannot reject the null hypothesis
(as 21.6 is NOT larger than 24.996). It also means that the P-value is above 0.05, which by
the way also could be found in R as:

> 1-pchisq(15*sd(x)^2/0.08^2,15)
[1] 0.1197993

So the correct answer is:

2  P-value> 0.05, the null hypothesis cannot be rejected

Question III.2 (7) If the following hypothesis test is carried out:

H0 : µ = 180
H1 : µ 6= 180
the following usual test statistic and conclusion are obtained, if α = 5% is used:


The usual t-test statistic becomes:

180.05 − 180
t= √ ) = 2.09
0.0959/ 16
And the critical value(s) become ±t0.025 (15) = 2.13. So since the observed t is within the
critical limitsm we cannot reject the null hypothesis. So the correct answer is:

4  Test statistic: 2.09. Conclusion: the null hypothesis cannot be rejected

Question III.3 (8) A 90%-confidence interval for µ becomes:


The one-sample 90%-confidence interval formula reads:

180.05 ± t0.05 (15) √
giving: (using in R: qt(0.95,15))
180.05 ± 1.753 √
So the correct answer is:

5  180.05 ± 1.753 · 0.0959/4

Question III.4 (9) A 99%-confidence interval for σ becomes:


The variance confidence interval formula reads:

15 · 0.09592 2 15 · 0.09592
< σ <
χ20.005 χ20.995

And taking the square root of everything, using from R: (or from χ2 -table)

> qchisq(c(0.005,0.995),15)
[1] 4.600916 32.801321

r r
15 · 0.0092 15 · 0.0092
32.801 4.601
So the correct answer is:

q q
15·0.0092 15·0.0092
1 32.801
<σ< 4.601

Question III.5 (10) The mean profile length for a new production of profiles is to be determined
by a new sample. Assuming that the standard deviation is approximately 0.1, so σ = 0.1mm,
and you want that a 95% confidence interval must have a width of only 0.05mm, how many
profiles should then be measured?


The one-sample confidence interval sample size formula reads: (with E = the wanted maximal
n = (z0.025 )2
Since a confidence interval is ± the maximal error, that is, the width of the confidence interval
is twice the maximal error, the wanted E = 0.05/2 = 0.025, and so the correct answer is:

3  (1.96 · 0.1/0.025)2 , hence at least 62

Exercise IV

A parallel connection of two electrical resistors R1 and R2 gives a total resistance, RP , which
is given by:
R1 · R2
RP =
R1 + R2
In a specefic parallel connection two resistors with the following values and standard deviations
are used: (in Ohm)
R1 = 500, σ1 = 25
R2 = 100, σ2 = 5
Question IV.1 (11) The variance of the total resistance Rp is approximately:


We must use the nonlinear error propagation rule with

f (x, y) =
We find the derivatives (using basic rules for differentiation of the fraction of two functions)
∂f (x, y) y(x + y) − xy(1) y2
= =
∂x (x + y)2 (x + y)2
∂f (x, y) x(x + y) − xy(1) x2
= =
∂y (x + y)2 (x + y)2
So, using the error propagation rule:
2 2
R22 R12
Var(RP ) ≈ 2
σ1 + 2
(R1 + R2 ) (R1 + R2 )
And plugging in the given numbers:
2 2
1002 5002
25 + 52 = 12.5
6002 6002
So the correct answer is:

 2  2
1002 5002
4 6002
25 + 6002
52 = 12.5

Exercise V

A company is to use some pipes. It is important for the use that the internal pipe roughness is
minimized. To find the most suitable pipe, samples obtained from four potential suppliers are
taken. On each pipe 9 measurements of the inner roughness are carried out. The measurement
data is shown in the following table:

Roughness Row Row

average standard deviation
A 17 25 22 21 16 22 23 20 17 20.33 3.08
B 21 25 20 19 24 19 21 21 17 20.78 2.49
C 14 13 16 16 17 24 20 15 19 17.11 3.41
D 18 19 20 12 13 19 20 14 17 16.89 3.10

A manager wants to compare supplier A and B with no use of any normal distribution assump-
tion, and had the following lines run in R:

k = 10000
Asamples = replicate(k, sample (xa, replace = TRUE))
Bsamples = replicate(k, sample (xb, replace = TRUE))
mymeandifs = apply(Asamples, 2, mean)-apply(Bsamples, 2, mean)

The histogram made in the last line shows the bootstrap-distribution of the mean differences:

Histogram of mymeandifs



−6 −4 −2 0 2 4


Question V.1 (12) What is the only reasonable conclusion based on the R-analysis among the
following options:


The bootstrap-distribution says something about the mean difference: It is clearly seen from
the figure that 0 is within the 95% bootstrap confidence interval for the mean - even though the
exact quantiles are not given. So, using the bootstrap based confidence interval as a hypothesis
test method, we can conclude that at 5% level there is no significant difference between the two
means. None of the other answers make any sense. Answer 2) is even more clearly so wrong:
there is much more than 0.5% of the distribution on both sides of 0. So the correct answer is:

3  There is at level 5% no significant difference between the mean roughnesses of supplier A

and B

Question V.2 (13) A usual analysis of variance of the whole data set is now carried out. It
is reported that SStotal = 4i=1 9j=1 (yij − ȳ)2 = 410.224. What will the F-test statistic for
the usual hypothesis in this situation: µ1 = µ2 = µ3 = µ4 and the conclusion with α = 5%


The F-statistic from the oneway ANOVA in this situation is:

M S(T r) SS(T r)/3
F = =
SStotal = SS(T r) + SSE

We can compute SSE and M SE from the four standard deviations (knowing the defining
SSE = 8s21 + 8s22 + 8s23 + 8s24 = 295.3968
using the four standard deviations as given. (Without rounding errors: SSE = 295.33) So:
SS(T r) = SStotal − SSE = 410.22 − 295.33 = 114.89
M S(T r) = 114.89/3 = 38.3
(and this is the same with/without rounding error). and
M SE = SSE/32 = 9.23
F = = 4.15
And since the critical value is F0.05 (3, 32) = 2.90 (In R: qf(0.95,3,32) or linear interpolation
in an F-table) we reject the hypothesis. So the correct answer is:

4 F = 9.23
= 4.15, so the hypothesis is rejected

Question V.3 (14) The variances of suppliers C and D are to be compared, and the following
hypothesis should be tested: (only based on the data for these two suppliers)
H0 : σC2 = σD

H1 : σC2 6= σD

The following usual test statistic and critical value with α = 0.10 are obtained:


The usual test statistic is the larger variance divided by the smaller:
F =
And the critical value hence comes from the F (8, 8), Fα/2 (8, 8) = F0.05 (8, 8) = 3.44 (in R:
qf(0.95,8,8) or in Table) So the correct answer is:

2  Test statistic: 3.102
. Critical value: 3.44

Exercise VI

A company that sells outdoor lighting, gets a lamp produced in 3 material variations: in
copper, with painted surface and with stainless steel. The lamps are sold partly in Denmark
and partly for export. For 250 lamps the distribution of sales between the three variants and
Denmark/export are depicted. The data is shown in the following table.

Danmark Export
Copper variant 7.2% 6.4%
Painted variant 28.0% 34.8%
Stainless steel variant 8.8% 14.8%

Question VI.1 (15) Is there a significant difference between the proportion exported and the
proportion sold in Denmark? (With α = 0.05)


The situation asked about here is a ”one proportion” case, where 44% out of 250 = 110 are sold
in Denmark (and hence 250-110=140 for export). The standard statistic for the hypothesis test
H0 : p = 0.5, is:
140 − 250 · 0.5
250 · 0.5 · 0.5
And as it is a two-tailed alternative, the critical values are ±z0.025 = ±1.96

So the correct answer is:

1  No, since 15/ 250/4 = 1.90 is within ±1.96

Question VI.2 (16) The relevant critical value to use for testing whether there is a significant
difference in how the sold variants are distributed in Denmark and for export: (with α = 0.05)


This is a so-called null hypothesis of homogeneity in a 3 × 2 frequency table (r × c Table). The

critical value for the χ2 -test is based on the χ2 with (r − 1)(c − 1) = 2 degrees of freedom. So
the correct answer is:

3  χ20.05 (2) = 5.991

Exercise VII

In a clinical trial of a cholesterol-lowering agent, 15 patients’ cholesterol (in mMol/l) has been
measured before treatment and 3 weeks after starting treatment. Data are listed in the following

Patient 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Before 9.1 8.0 7.7 10.0 9.6 7.9 9.0 7.1 8.3 9.6 8.2 9.2 7.3 8.5 9.5
After 8.2 6.4 6.6 8.5 8.0 5.8 7.8 7.2 6.7 9.8 7.1 7.7 6.0 6.6 8.4

The following is run in R:


with the following results:

> t.test(x1,x2,var.equal=TRUE)

Two Sample t-test

data: x1 and x2
t = 3.3206, df = 28, p-value = 0.002505
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.4648623 1.9618043
sample estimates:
mean of x mean of y
8.600000 7.386667

> t.test(x1,x2,pair=TRUE)

Paired t-test

data: x1 and x2
t = 7.3407, df = 14, p-value = 3.672e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.8588225 1.5678442
sample estimates:
mean of the differences

Question VII.1 (17) Can there, based on these data be demonstrated a significant decrease in
cholesterol levels with α = 0.001?


The first analysis is considering the data as two independent samples, whereas the second
analysis is the paired analysis. The latter is the one that fits the situation, so we read off the
P-value for the two-tailed paired t-test to be 0.000003672, and the one-tailed P-value actually
asked for will then be the half of this. So the correct answer is:

5  Yes, since the relevant P-value is less than 0.001

Question VII.2 (18) What is the median for the cholesterol measurements for the patients
before treatment?


The observations in ordered sequence is:

7.1 7.3 7.7 7.9 8.0 8.2 8.3 8.5 9.0 9.1 9.2 9.5 9.6 9.6 10.0

The middle one of these 15 observations (no 8) is the number 8.5. So the correct answer is:

2  8.5

Exercise VIII

When producing plastic tubes, the tube diameter must be continuously controlled. By a test
production two pieces of tube with two different settings of the machine were made. The
diameter is measured 10 places on each tube with the following results: (in mm)

Setting 1: n1 = 10, x̄1 = 5.996, s1 = 0.0082

Setting 2: n2 = 10, x̄2 = 6.014, s2 = 0.0121

Question VIII.1 (19) The two variances cannot be shown to be significantly different (with
α = 0.10), since:


The usual test statistic is the larger variance divided by the smaller:

F = = 2.177
And the critical value hence comes from the F (9, 9), Fα/2 (9, 9) = F0.05 (9, 9) = 3.18 (in R:
qf(0.95,9,9) or in Table) So the correct answer is:

2 0.00822
= 2.177 < F0.05 (9, 9) = 3.18

Question VIII.2 (20) Assume that the variances within the two settings are the same. What
is then the 95%-confidence interval of the mean diameter difference?


This is the standard two independent samples (pooled variance) confidence interval:
(6.014 − 5.996) ± t0.025 (18) · sp (1/10 + 1/10)
where t0.025 (18) = 2.101 (in R: qt(0.975,18)) and
0.00822 + 0.01212
sp = = 0.0103
So the correct answer is:

5  0.018 ± 2.101 · 0.0103 2/10

Exercise IX

At the local elections in Denmark in November 2013 the Social Democrats (A) had p = 29.5%
of the votes at the country level. An early so-called exit poll estimated that they would only
get 22.7% of the votes. Suppose the exit poll was based on 740 people out of which then 168
people reported having voted for A.

Question IX.1 (21) At the time of the exit poll the p was of course not known. If the following
hypothesis is tested based on the exit poll:
H0 : p = 0.295
H1 : p 6= 0.295
the following test statistic and conclusion are obtained: (with α = 0.001)


The one-proportions test statistic is:

168 − 740 · 0.295
Z=p = −4.05
740 · 0.295 · (1 − 0.295)
And the critical values are ±z0.0005 = ±3.291 (in R: qnorm(0.9995)) So the correct answer is:

4  Test statistic: −4.05. Conclusion: We reject the null hypothesis, since −4.05 < −z0.0005 =

Question IX.2 (22) A 95%-confidence interval for p based on the exit poll becomes:


So the correct answer is:

4  0.227 ± 1.96 · 740

Question IX.3 (23) Based on a scenario that a voting proportion is about 30%, how large an
exit poll should be done to achieve a 99%-confidence interval having a width of 0.01?


The proportion sample size formula using a guess of p = 0.3 reads:

0.3 · 0.7 · (z0.005 /E)2

where E is the maximal error and since the confidence interval is plus/minus the maximal error,
we should take E = 0.01/2 and z0.005 = 2.576 (in R: qnorm(0.995)) So the correct answer is:

5  0.3 · 0.7 · (2.576/(0.01/2))2 ≈ 55741 persons

Exercise X

In a study of pollution in a water stream, the concentration of pollution is measured at 5

different locations. At each location four water samples are taken and the concentration is
measured (in mg/l). The result of the analysis is shown in the table below:

1 2 3 4 5
9.9 8.8 10.3 9.5 10.5
9.1 10.0 11.0 10.9 11.3
9.7 10.1 9.5 9.9 11.9
8.5 9.8 10.2 10.6 12.3

In addition the information is given that the locations are at different distances to the pollution
source. In the table below, these distances and the average pollution are given:

Distance to the pollution source (in km) 10 8 6 4 2

Average concentration 9.3 9.675 10.25 10.225 11.5

Two relevant analyses are run in R, first:



with the following result, where a part of the usual R-output has been been omitted: (and
substituted by an ”X”)

Analysis of Variance Table

Response: Concentration
Df Sum Sq Mean Sq F value Pr(>F)
Location X 11.113 X X X
Residuals X 6.465 X

and next: (2. analysis)


with the following result:

lm(formula = Concentration ~ Distance)

1 2 3 4 5
0.10 -0.02 0.06 -0.46 0.32

Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.67500 0.34679 33.665 5.76e-05 ***
Distance -0.24750 0.05228 -4.734 0.0179 *
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.3307 on 3 degrees of freedom

Multiple R-squared: 0.8819,Adjusted R-squared: 0.8426
F-statistic: 22.41 on 1 and 3 DF, p-value: 0.01787

Question X.1 (24) Based on the first analysis, what will the test statistic and the critical value
(with α = 0.05) for the hypothesis of no difference in the mean values for the five locations be?


The first analysis is a oneway ANOVA, where the test statistic is

M S(T r) SS(T r)/4 11.113/4
F = = = = 6.45
M SE SSE/15 6.465/15
And the critical value comes from the F (4, 15), F0.05 (4, 15) = 3.06. So the correct answer is:

1  Test statistic = 6.45 and critical value = 3.06

Question X.2 (25) A 90%-confidence interval for the mean difference between Location 1 and
2 becomes:


The mean difference between Location 1 and 2 is:

x̄2 − x̄1 = 9.675 − 9.3 = 0.375
We use the posthoc pairwise confidence interval method for the ANOVA setting:
0.375 ± t0.05 (15) M SE(1/4 + 1/4)
where M SE = SSE/15 = 6.465/15 = 0.431 and t0.05 (15) = 1.753. So the correct answer is:

3  0.375 ± 1.753 · 0.431 · 1/2

Question X.3 (26) What are the parameter estimates for the three unknown parameters in
the usual linear regression model that is underlying the second analysis: 1) The intercept, 2)
the slope and 3) (residual) standard deviation?


Given the knowledge of the R-output structure, the three values can be read off directly from
the output. So the correct answer is:

3  1) 11.675, 2) −0.2475 and 3) 0.3307

Question X.4 (27) How large a part of the variation in concentration can be explained by the


The amount of Y-variation explained by the X-variable can be found from the squared corre-
lation, that can be read off directly from the output. (”Multiple R-squared”) So the correct
answer is:

1  88.2%

Question X.5 (28) A 95%-confidence interval for the expected pollution concentration 7km
from the pollution source becomes:


The wanted number is estimated by the point on the line (using x0 = 7):
−0.2475 ∗ 7 + 11.675 = 9.94
and the confidence interval is given by
1 (7 − 6)2
9.94 ± t0.025 (3) · se +
5 Sxx
where Sxx = 42 + 22 + 02 + 22 + 42 = 40 and t0.025 (3) = 3.182 (in R: qt(0.975,3)) we have that
1 1
3.182 · 0.3307 + = 0.50
5 40
So the correct answer is:

4  9.94 ± 0.50

Exercise XI

Some plastic tubes for which the tensile strength is essential are to be produced. Hence, sample
tube items are produced and tested, where the tensile strength is determined. Two different
granules and four possible suppliers are used in the trial. The measurement results (in MPa)
from the trial are listed in the table below.

g1 g2
Supplier a 34.2 33.1
Supplier b 34.8 31.2
Supplier c 31.3 30.2
Supplier d 31.9 31.6

The following is run in R:


with the following result:

Analysis of Variance Table

Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
Supplier 3 10.0338 3.3446 3.2537 0.1792
Granule 1 4.6512 4.6512 4.5249 0.1233
Residuals 3 3.0837 1.0279

Question XI.1 (29) What distribution has been used to find the P-value 0.1792?


The P-value is from the F-test from a two-way ANOVA using the F (3, 3)-distribution. So the
correct answer is:

2  The F-distribution with the degrees of freedom ν1 = 3 and ν2 = 3

Question XI.2 (30) What is the most correct conclusion based on the analysis? (use α = 0.05)


Since both of the P-values are larger than 0.05 none of the two usual hypothesis tests (of no
group difference ) are significant. So the correct answer is:

3  No significant difference can be demonstrated between the means for neither the 4 sup-
pliers nor the 2 granules


