Professional Documents
Culture Documents
Introduction To Statics
Introduction To Statics
0199
143
200399
97
400599
64
600799
51
800999
14
10001199
14
12001999
17
Construct a histogram and cumulative frequency polygon for these data. Estimate the percentage
of bulbs with lifetime less than 480 hours.
120
20 40 60 80
0
Answer: Lifetimes cannot be negative so class intervals are [0, 199.5), [199.5, 399.5), [399.5, 599.5),
and so on.
500
1000
1500
2000
Lifetime (hours)
Adjust height of the rectangle for the 12002000 interval to make histogram area proportional
to frequency. If the vertical axis is frequency per interval of 200 hours, the height of the [0, 199.5)
class is 143 200/199.5 = 143.4 to allow for the first class not being of width 200.
Lifetime (hours)
Cumulative frequency
0.0
0
199.5
143
399.5
240
599.5
304
799.5
355
999.5
369
1299.5
383
1999.5
400
300
280
265.8
260
Cumulative freq.
300
200
480
240
100
0
Cumulative freq.
400
500
1000
1500
2000
400
450
Lifetime (hours)
500
Lifetime (hours)
480 399.5
(304 240) = 265.8.
200
1
550
600
Required percentage is
265.8
100 = 66.4%
400
1
3
2
0
3
6
4
7
5
8
6
18
7
8
8
0
9
2
2
3
3
9
4
16
5
24
6
42
7
50
8
50
9
52
40
30
20
0
10
Cumulative freq.
50
60
Number x of segments
Number of branches with x segments
10
Number of segments
(4, 2]
10
(2, 0]
43
(0, +2]
39
(+2, +4]
5
(+4, +6]
3
Show that the sample mean and sample standard deviation for these data are x
= 0.04 and
s = 1.717 respectively.
Answer:
Class
4< x 2
2< x 0
0< x +2
+2< x +4
+4< x +6
Totals
Class frequency f
10
43
39
5
3
n = 100
Class mid-point x
3
1
+1
+3
+5
fx
30
43
39
15
15
4
f x2
90
43
39
45
75
292
x
=
s2 =
4
= 0.04 .
100
1
(292 100 (0.04)) = 2.9479,
99
so
s=
(s2 ) =
2.9479 = 1.717 .
019
16
2039
13
4059
17
6079
4
8099
4
100119
3
120139
1
140159
1
160179
1
Determine the median and semi-interquartile range. Explain why this pair of statistics might be
preferred to the mean and standard deviation for these data.
Answer:
Time (hours)
Cumulative frequency
0.0
0
19.5
16
39.5
29
59.5
46
79.5
50
99.5
54
119.5
57
139.5
58
159.5
59
179.5
60
15
10
5
0
20
The histogram for these data is positively skew, so the median and semi-interquartile range might
be preferred to the mean and standard deviation as measures of location and dispersion respectively.
50
100
Interarrival time (hours)
150
200
1X
90
x
=
xi =
= 9 minutes.
n
10
i=1
)
( n
X
1
x2i n
x2 = 5.42.
s2 =
n1
i=1
2
2
Estimate the population variance by s with s = 5.42 = 2.33. Then
X
tn1 .
s/ n
= 9 (2.262 0.737)
= 9 1.667 = (7.3, 10.7).
Since 8 minutes lies inside the 95% confidence interval we would accept H0 in testing H0 : =
8 vs. H1 : 6= 8 at the 5% significance level.
e1
,
x!
x = 0, 1, 2, . . . ,
X
X
X
=
Z=
=
/ n
0.25
0.0625
where Z N(0, 1) if H0 is true.
For = 0.05 with a two-sided test, z/2 = 1.96. Critical region is Z < 1.96 and Z > 1.96.
Observed value is z = 0.45/0.25 = 1.8. This does not lie in critical region so accept H0 .
For = 0.05 with a one-sided test, z = 1.645. Critical region is Z < 1.645. Observed value
is z = 1.8 which lies in critical region so reject H0 .
1
3.0
2.8
2
6.7
5.1
3
11.3
8.4
4
5.0
5.0
5
9.4
6.2
6
15.7
12.2
7
8.0
10.0
8
10.0
6.8
9
9.7
6.0
Is there any evidence that the average absenteeism rate is different for the two years?
Answer: Data paired same employee studied in each of the two years.
Form difference di = (year 1)i (year 2)i . Need to estimate variance d2 .
Test H0 : d = 0 vs. H1 : d 6= 0. See lecture 6.
0.12
49.8
0.12
46.1
0.13
46.5
0.13
45.8
0.14
44.3
0.14
45.9
Answer: Let X denote the glue thickness and Y the joint strength.
x
0.12
0.12
0.13
0.13
0.14
0.14
0.78
Totals
y
49.8
46.1
46.5
45.8
44.3
45.9
278.4
x2
0.0144
0.0144
0.0169
0.0169
0.0196
0.0196
0.1018
y2
2480.04
2125.21
2162.25
2097.64
1962.49
2106.81
12934.44
xy
5.976
5.532
6.045
5.954
6.202
6.426
36.135
1
s2X = {0.1018 6(0.131)2 } = 0.00008,
5
1
1
s2Y = {12934.44 6(46.41)2 } = 3.336, sXY = {36.135 6(0.131)(46.41)} = 0.0114.
5
5
sXY
0.0114
= 0.698.
rXY =
=
sX sY
0.00008 3.336
x
=
0.78
= 0.131,
6
y =
278.4
= 46.41,
6
Regression line:
y = y + (x x
)
sXY
0.0114
= 64.925 142.5x.
= 46.4 + (x 0.13)
0.00008
s2X
HHH
3
3
1/8
HHT
2
2
1/8
0
1
2
3
pY (y)
HTH
2
1
1/8
1
0
1/8
1/8
0
1/4
HTT
1
2
1/8
Y
2
0
1/4
1/4
0
1/2
3
1/8
0
0
1/8
1/4
THH
2
2
1/8
THT
1
1
1/8
pX (x)
1/8
3/8
3/8
1/8
Total = 1
TTH
1
2
1/8
TTT
0
3
1/8
Joint probabilities p(x, y) are found by summing probabilities for each outcome giving rise to
(X = x, Y = y). Thus p(1, 2) = pr{HT T or T T H} = 1/4.
Marginal probabilities are found by forming row or column sum. Thus, for Worked Example,
pr{X = 2} = p(2, 1) + p(2, 2) + p(2, 3) =
3
.
8
(c) If X = 1, then
pr{Y = y|X = 1} =
p(1, y)
p(1, y)
=
.
pX (1)
3/8
Thus
pr{Y = 1|X = 1} =
1/8
= 1/3,
3/8
pr{Y = 2|X = 1} =
2/8
= 2/3,
3/8
pr{Y = 3|X = 1} = 0.
If X = 1, then the outcome is one of HTT, THT, TTH. In one out of these three cases we observe
Y = 1 and in two out of three we observe Y = 2.
f(x,y)
0
1
Case z < 1
Case z > 1
2-z
x+y<z
x+y<z
1 2
2z
if 0 < z < 1,
1 21 (2 z)2 if 1 z < 2.
(b) If Z = X + Y , then Z has cumulative distribution function F (z) where, from (a) above,
1 2
if 0 < z < 1,
2z
F (z) = pr{X + Y z} =
1 12 (2 z)2 if 1 z < 2.
dF (z)
Probability density function for Z is then f (z) =
=
dz
z
if 0 < z < 1,
2 z if 1 z < 2.
cov(X, Y )
Var[X]Var[Y ]
p
cov(X, Y ) = corr(X, Y ) Var[X]Var[Y ] = 2 2 = 2 .
Answer:
E[T ] = E[a1 X1 + a2 X2 ] = a1 E[X1 ] + a2 E[X2 ] = a1 + a2 = (a1 + a2 ).
If we require E[T ] = , then a1 + a2 = 1, so that a2 = 1 a1 .
Since E[T ] = , then T is said to be an unbiased estimator of the mean .
Var[T ] = Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] = a21 2 + a22 2 = (a21 + a22 ) 2 .
Since a2 = 1 a1 , Var[T ] = {a21 + (1 a1 )2 } 2 = (2a21 2a1 + 1) 2 . Differentiate this with respect
to a1 to find the minimum.
d
Var[T ] = (4a1 2) 2 ,
da1
which is zero when a1 = 21 . Hence Var[T ] is a minimum when a1 = a2 =
Alternative derivation: write a1 =
1
2
+ , a2 =
1
2
1
2
so T = 21 (X1 + X2 ).
. Then
103
97
103
95
105
94
106
93
108
91
105
95
106
94
At the 5% level of significance, test whether the average noise level of petrol-powered chain saws
is higher than for electric-powered chain saws.
Answer: Testing H0 : 1 = 2 vs. H1 : 1 > 2 , i.e. H0 : 1 2 = 0 vs. H1 : 1 2 > 0.
Have two independent samples with unknown variance. Need to assume variances are equal.
19.8
23.2
22.1
22.0
21.5
22.2
20.9
21.2
22.0
21.6
21.0
21.6
22.3
21.9
21.0
22.0
20.3
22.9
20.9
22.8
Assuming the variances for each group are the same, is there any evidence at the 5% level to
suggest that the egg size differs between the two host species?
10
6
0.8849, 0.2119, 0.2881. Hint: If X N(, 2 ), then pr{X x} = x
.
7
From tables, x = 2.015.
8
t8 (2.5%) = 2.306. pr{T > 1.860} = 0.05 so pr{T < 1.860} = 0.05 by symmetry. Thus t = 1.860.
2
11
If T t10 , what is pr{T < 2.228}? What is pr{2.228 < T < 2.228}?
Answer: 9
0.025, 0.95.
n = 5, x
= 16.4, 1.96/ n = 1.753, so 95% interval is 16.4 1.75 = (14.65, 18.15).
s
11
n = 9, x
= 9.0, s2 = 6.25, t8 (2.5%) = 2.306. Interval is x
t8 (2.5%) = 9.0 1.92.
n
Rcode which could be used to obtain required quantities:
x=c(10.3,9.4,9.9,7.5,11.7,3.4,7.8,11.0,10.0)
mean(x)
var(x)
qt(0.975,8) # Gives 2.5% percentage point for t(8) pdf.
12
Width of interval is 2 (1.96/ n). Thus require 2 (1.96 4)/ n < 0.5 so n > 162 1.962 = 983.45 so take
n = 984.
x
0
13
= 4. Test rule is reject H0 if |z| > 1.96.
n = 4, x
= 4, 2 = 4, 0 = 0, 2 /n = 1. Test statistic is z =
/ n
Thus reject H0 at 5% level.
14
Let X be number of sixes in 100 throws, so X Bin(n = 100, = 1/6) if H0 true. X N( = 16.667, 2 =
x 16.667
13.889) if H0 true. Test statistic is z =
= 2.236. Test rule is reject H0 if |z| > 1.96, so reject H0 at 5%
13.889
level.
10
12
Answer:
15
1.1
3.3
2.2
6.1
3.4
7.0
4.5
10.4
5.0
11.5
16
1.1
3.3
2.2
6.1
3.4
7.0
4.5
10.4
5.0
11.5
17
1.1
3.3
2.2
6.1
3.4
7.0
4.5
10.4
5.0
11.5
15
n = 4, x
= 4, s2 = 3.333, 0 = 1, s2 /n = 0.8333. Test statistic is t =
41
x
0
=
= 3.286. Test rule is
/ n
0.8333
1 X
1 X 2
16
x
= 3.24, s2x =
xi n
x2 = 2.593,
(xi x
)2 =
n1
n1
1 X
1 X 2
y = 7.66, s2y =
yi n
y 2 = 11.033,
(yi y)2 =
n1
n 1
X
p
1
1 X
xi yi n
xy = 5.2645, rXY = sxy / s2x s2y = 0.984.
(xi x
)(yi y) =
sxy =
n1
n1
Check your answer using R!
x=c(1.1,2.2,3.4,4.5,5.0)
# And setup y similarly.
cor(x,y)
17
x
= 3.24, y = 7.66, s2x = 2.593, s2y = 11.033, sxy = 5.2645. Regression line is y = + x where = sxy /s2x =
2.030,
= y x
= 1.082 so fitted line is y = 1.082 + 2.030x. If x1 = 1.1, predict y1 = 3.315. At x = 1.1, residual
is r1 = y1 y1 = 3.3 3.315 = 0.015. If x = 4, predict y = 9.023. Check your answers using R!
x=c(1.1,2.2,3.4,4.5,5.0)
# And setup y similarly.
lm(yx)
# Gives parameter estimates.
model=lm(yx)
# Stores regression model output as model.
model$residual[1]
# First residual value.
r
r
P
2
2
2
18
13
Answer:
19
0
1
2
0
0.1
0.2
0.1
Y
1
0.1
0.0
0.0
2
0.1
0.2
0.2
Obtain the marginal probabilities pX (x) and pY (y) for X and Y . Hence obtain E[X], E[Y ], Var[X],
Var[Y ]. Obtain cov(X, Y ) and corr(X, Y ).
Answer: 22
Yes, 3, 1.
pX (0) = 0.7, pX (1) = 0.3, pr{Y = 0|X = 1} = 23 , pr{Y = 1|X = 1} = pr{X = 1 Y = 1} /pr{X = 1} = 13 .
E[XY ] = 0.1. RNo.
21
fX (x) = y fXY (x, y) dy = 2x for 0 < x < 1. E[XY ] = 94 . Yes.
22
Marginal probabilities for X are 0.3, 0.4, 0.3, and for Y they are 0.4, 0.1, 0.5. E[X] = 1, E[Y ] = 1.1,
Var[X] = 0.6, Var[Y ] = 0.89, cov(X, Y ) = 0.1, corr(X, Y ) = 0.137.
23
Var[X] + cov(X, Y ) = 2.5.
24 2
2
2
2
2
X Y2 = 12, X
+ 2XY + Y2 =20, X
2XY + Y2 = 16, so 2X
+ 2Y2 = 36 and 4XY = 4. Thus X
= 15,
2
Y = 3, XY = 1 and corr(X, Y ) = 1/ 45.
25
X Bin(n = 100, = 16 ). Similarly for Y . Z Bin(100, = 31 ). Var[X] = Var[Y ] = 500/36, Var[Z] = 200/9 =
2
X + 2XY + Y2 . Hence cov(X, Y ) = 100/36 so corr(X, Y ) = 15 . Notice X and Y are not uncorrelated. If you
have a lot of ones, you would expect fewer twos!
20
14
p
cov(X, Y ) = corr(X, Y ) Var[X]Var[Y ] so cov(X, Y ) = 0.6 and cov(X + 2Y, X Y ) = Var[X] + cov(X, Y )
2Var[Y ] = 13.4.
27
X Y N(0, 25) so we want pr{5 < X Y +5}. pr{X Y 5} = (1) = 0.8413 so pr{X Y > 5} =
0.1587 and answer is 0.6826.
28
2
2
= E[(X
{
X
})
=
(X
(X
)
=
n(
X
).
Thus
(X
X)
=
i ) + (X ) 2(Xi )(X ) and
i
i
i
i
P
2
2
15
n1
n2
49 + 12
= (1.899) = 0.0288. Notice we have used a continuity correction.
pr{X < 50} = pr{X 49} =
34
Number of heads X Bin(n = 100, ). s
Here n = 100, X = 72 observed, = X/n = 72/100 = 0.72.
)
(1
= 0.72 0.088.
n
http://www.thisismoney.co.uk/money/investing/article-1709914/Stock-market-predict
Data source:
16
1
16
2
15
3
16
4
15
5
15
6
23
17
)
1 + 1
(1
n1
n2
5% level if |z| > 1.96, so here accept the hypothesis that the two proportions are equal.
36
Two binomial proportions again. 1 = 52/1799 = 0.028905, 2 = 41/1433 = 0.028611, n1 = 1799, n2 = 1433.
17991 + 14332
Common estimated proportion is =
= 0.0288. (This is very small so the normal approxima3232
tion is doubtful. In practice we would transform to give approximate normality.) Approximate test statistic is
|1 2 |
z= r
= 0.0496. Reject H0 at 5% level if |z| > 1.96, so here accept the hypothesis that the two
1
1
(1 ) n1 + n2
1 2
0.200 0.275
z= q
=
= 1.31. Test rule is reject H0 if |z| > 1.96, so accept H0 at 5%
(1
)
(1
)
0.0017889
+ 0.0014907
+
n1
n2
level.
38
From tables, x = 9.488.
39
Let X denote the outcome of the die. We test whether pr{X = i} = 1/6 for all i. Expected frequency for any
outcome would then be 100 61 = 16.667.
Outcome i
Observed frequency Oi
Expected frequency Ei
(Oi Ei )2 /Ei
1
16
16.67
0.0267
2
15
16.67
0.1667
3
16
16.67
0.0267
17
4
15
16.67
0.1667
5
15
16.67
0.1667
6
23
16.67
2.407
sum=2.960
0
28
1
5
2
2
Test at the 5% level whether a Poisson distribution gives a good fit to the data. Why is the Poisson
distribution a suitable model for these data?
Answer: 40
Like
44
30
74
OK
23
20
43
Dislike
33
30
63
Total
100
80
180
Test whether the like/OK/dislike population proportions for the two surveys are equal.
Answer: 41
Number of cells is 6; number of estimated parameters is 0; number of constraints on expected frequencies is 1.
Number of degrees of freedom is k = 6 0 1 = 5. Test statistic is 2obs = 2.960. Reject H0 if 2obs > 25 (5%). As
25 (5%) = 11.071, we accept the null hypothesis that the die is fair.
40
x
= 9/35 = 0.257. Best fitting Poisson distribution is X Poisson( = 0.257). Fitted probabilities are
x e
for x = 0, 1, 2, . . .. Fitted frequencies are Ex = 35 pr{X = x} for x = 0, 1, 2, . . ..
pr{X = x} =
x!
Number of accidents X
Observed frequency Oi
Expected frequency Ei
(Oi Ei )2 /Ei
0
28
27.06
1
5
6.959
0.0324
0.5516
2
2
0.977
(by difference)
1.0723
sum=1.656
Like
(100 74)/180 = 41.11
32.89
74
OK
23.89
19.11
43
Dislike
35.00
28.00
63
Total
100
80
180
Like
0.2030
0.2538
OK
0.0331
0.0413
Dislike
0.1143
0.1429
X
(Oi Ei )2 /Ei = 0.7883. Reject H0 if 2obs > 22 (5%). As 22 (5%) = 5.991, so accept
i
hypothesis that the two surveys have the same overall proportions for each category.
Note that the test is whether the proportion liking is the same for surveys A and B, and the proportions saying OK
are the same for A and B, and the proportions disliking are the same for A and B.
18