Professional Documents
Culture Documents
Solution Key: ORF 245 - Fundamentals of Engineering Statistics Final Exam
Solution Key: ORF 245 - Fundamentals of Engineering Statistics Final Exam
Final Exam
Instructions: This exam is open book and open notes. Calculators are
allowed, but not computers or the use of statistical software packages. Write
all your work in the space provided after each question. Use the other side of
the page, if necessary. Explain as thoroughly and as clearly as possible all
your steps in answering each question. Full or partial credit can only be
granted if intermediate steps are clearly indicated.
Pledge: ______________________________________________________
_____________________________________________________________
Signature: ____________________________________________________
∑ x = x.
∑ x − ∑c = 0 ⇒ ∑ x ∑x
i
⇒ i i − nc = 0 ⇒ nc = i ⇒c=
n
∑ (x − x)
n 2
b) (3 pts.) Using the result of part (a), which of the two quantities i =1 i and
∑ (x − μ)
n
will be smaller than the other (assuming x ≠ μ )?
2
i =1 i
∑ (x i − x )2 < ∑ (x i − μ )2 .
c) (3 pts.) Let a and b be constants and let yi = axi + b for i = 1, 2,..., n . What are the
relationships between x and y and between sx2 and s 2y ? Show how you obtained
the results.
y=
∑ y = ∑ ( ax + b ) = a∑ x + nb = ax + b.
i i i
n n n
∑ ( y − y ) = ∑ ( ax + b − (ax + b) ) = ∑ ( ax − ax )
2 2 2
=
i i i
s y2
n −1 n −1 n −1
∑( x − x )
2 2
a
= = a 2 s x2 .
i
n −1
y = (87.3) + 32 = 189.14
9
5
2
⎛9⎞
s y = s 2y = ⎜ ⎟ (1.04 )2 = 3.5044 = 1.872
⎝5⎠
2
2) Consider the following stem and leaf plot that shows the GPAs of 30 students
recently admitted to the graduate program in IEOR at UC Berkeley.
b) (3 pts.) Knowing that the sample mean is given by x = 3.7207 and the sample
standard deviation is given by s = 0.1457 , determine the proportion of the data
values that lies within x ± 1.5s .
x − 1.5s = 3.7207 − 1.5 × 0.1457 = 3.50
x + 1.5s = 3.7207 + 1.5 × 0.1457 = 3.94
25 ⎛ 24 ⎞
⇒ p= = 83.3% ⎜ or p = 30 = 80.0% ⎟
30 ⎝ ⎠
c) (3 pts.) Determine the proportion of the data values that lies within x ± 2s .
x − 2 s = 3.7207 − 2 × 0.1457 = 3.43
x + 2 s = 3.7207 + 2 × 0.1457 = 4.01
30
⇒ p= = 100%
30
d) (3 pts.) Do the data appear to be approximately normal? Explain. [Hint: use the
results obtained in parts (b) and (c).]
For a normal distribution:
P[−1.5σ ≤ X ≤ 1.5σ ] = P[−1.5 ≤ Z ≤ 1.5] = 86.6%
P[−2σ ≤ X ≤ 2σ ] = P[−2 ≤ Z ≤ 2] = 95.4%
The data do appear to be approximately normal, since it is symmetrical around the
mean value and the proportions of values in the ranges x ± 1.5s and x ± 2s (parts b
and c above) approximately match the corresponding normal ranges above.
3
Probability:
3) Suppose that distinct integer values are written on each of 3 cards. Suppose you are to
be offered these cards in a random order. When you are offered a card you must
either immediately accept it or reject it. If you accept a card, the process ends. If you
reject a card, then the next card (if a card remains) is offered. If you reject the first
two cards offered, then you must accept the final card.
a) (4 pts.) If you plan to accept the first card offered, what is the probability that you
will accept the highest valued card?
1
P [ accept highest card ] = P [1st card is highest ] =
3
b) (6 pts.) If you plan to reject the first card offered, and to then accept the second
card if and only if its value is greater than the value of the first card, what is the
probability that you will accept the highest valued card?
P [ accept highest card ] = P [ 2nd card is highest | 1st card is lowest ] P [1st lowest ] +
P [ 2nd card is highest | 1st card is middle ] P [1st middle ] +
P [ 2nd card is lowest | 1st card is middle] P [1st middle ]
1 1 1 1 1 1 3 1
= × + × + × = = .
2 3 2 3 2 3 6 2
4
4) Each of 2 balls is painted black or gold and then placed in an urn. Suppose that each
ball is colored black with probability 1 2 , and that these events are independent.
a) (6 pts.) Suppose that you obtain information that the gold paint has been used
(and thus at least one of the balls is painted gold). Compute the conditional
probability that both balls are painted gold.
b) (4 pts.) Suppose, now, that the urn tips over and 1 ball falls out. It is painted gold.
What is the probability that both balls are gold in this case? Explain.
1
P [ other is gold | one is gold ] = P [ other is gold ] = ,
2
because of the independence of the painting events.
5
Random Variables:
5) A large metropolitan transit authority has a fleet of 1000 buses, and it is observed
that, on an average, there is a 0.4% chance that a bus will break down on any one day.
a) (5 pts.) If the maintenance department of the transit organization has sufficient
capacity to cope with 6 breakdowns on any day, calculate, correct to four
significant digits, the probability that on any day there will be insufficient staff to
attend to all the breakdowns occurring on that day.
Using the Poisson approximation, instead of direct calculation of the binomial probabilities:
e −4 4 x 6
e −4 4 x
P [ X = x] ≈
x!
⇒ 1 − P [ X ≤ 6] = 1 − ∑
x =0 x!
=
⎡ 4 2 4 3 4 4 45 46 ⎤
1 − e −4 ⎢1 + 4 + + + + + ⎥ = 1 − 0.8893 = 11.07%.
⎣ 2 6 24 120 720 ⎦
b) (5 pts.) If the probability of having insufficient staff on any day were actually
11.1%, let Y be the number of days that would have to go by for the organization
to experience 3 days of insufficient maintenance capacity (note that Y ≥ 3 ). What
are the expected value and the standard deviation of Y ?
6
6) Each of two detergents, A and B, is marketed in cartons which nominally contain 25
oz of powder. However, it is observed that 15% of the cartons of A contain less than
24.5 oz and 6% more than 26 oz of detergent. On the other hand, 5% of the cartons of
B contain less than 24 oz and 8% more than 26.5 oz. Assume that the weight
distributions for both detergents are normal and that they are independent from each
other. Assume also the price and the quality of the detergents are the same.
a) (5 pts.) Determine which detergent gives, on average, better value. [Hint:
compare the average weights of the cartons.]
A ∼ N ( μ A ,σ A2 ) and B ∼ N ( μ B ,σ B2 )
(24.5 − μ A ) ⎞
P [ A < 24.5] = 0.15 ⇒ Φ ⎛⎜
⎝ σ A ⎟⎠ = 0.15 ⇒ 24.5 − μ A = −1.0364σ A
(26 − μ A ) ⎞
P [ A > 26] = 0.06 ⇒ Φ ⎛⎜
⎝ σ A ⎟⎠ = 0.94 ⇒ 26 − μ A = 1.5548σ A
Thus: μ A = 25.100 and σ A = 0.5789
(24 − μ B ) ⎞
P [ B < 24] = 0.05 ⇒ Φ ⎛⎜
⎝ σ B ⎟⎠ = 0.05 ⇒ 24 − μ B = −1.6449σ B
(26.5 − μ B ) ⎞
P [ B > 26.5] = 0.08 ⇒ Φ ⎛⎜
⎝ σ B ⎟⎠ = 0.92 ⇒ 26.5 − μ B = 1.4051σ B
Thus: μ B = 25.348 and σ B = 0.8197
As μ B > μ A , then detergent B is a better value than A.
b) (5 pts.) Find the probability that detergent B gives better value than detergent A.
(
B − A ∼ N μ B − μ A ,σ A2 + σ B2 )
⎡ 0 − μB + μ A ⎤
Thus: P [ B > A] = P [ B − A > 0] = P ⎢ Z > ⎥
⎢ σ 2
+ σ 2 ⎥
⎣ A B ⎦
⎛ 25.100 − 25.348 ⎞
= 1− Φ ⎜
⎜ ⎟⎟ = 1 − Φ ( −.2471)
⎝ .5789 + .8197 ⎠
2 2
= 1 − .4024 = 59.76%.
c) (5 pts.) Find the probabilities that the contents of randomly selected cartons of A
and B will be less than the nominal weight.
⎡ 25 − 25.100 ⎤
P [ A < 25] = P ⎢ Z < = Φ ( −.1727 ) = 43.14%
⎣ .5789 ⎥⎦
⎡ 25 − 25.348 ⎤
P [ B < 25] = P ⎢ Z < = Φ ( −.4245 ) = 33.56%
⎣ .8197 ⎥⎦
7
Joint Probability Distributions:
7) (10 pts.) Five individuals, including A and B, take seats around a circular table in a
completely random fashion. Suppose the seats are numbered 1,…,5. Let X = A’s seat
number and Y = B’s seat number. If A sends a written message around the table to B
in the direction in which they are closest, how many individuals (including A and B)
would you expect to handle the message?
Let Z be the # of individuals handling the message. Consider the following table of
values of Z for all possible joint values of X and Y:
Z Y
1 2 3 4 5
X 1 - 2 3 3 2
2 2 - 2 3 3
3 3 2 - 2 3
4 3 3 2 - 2
5 2 3 3 2 -
1 1 1
Each cell (i, j ) has P [ X = i, Y = j ] = pij = P [Y = j | X = i ] P [ X = i ] = × = , ∀i , j
4 5 20
1 1
Thus: EZ = 10 × 2 × + 10 × 3 × = 2.5.
20 20
8
Statistical Estimation:
8) (10 pts.) A binomial population ( X 1 ) gives rise to observations in two distinct classes
A and a with probabilities 1 − θ and θ respectively, where θ is an unknown
parameter such that 0 < θ < 1 . A second binomial population ( X 2 ), independent from
the first, gives rise to observations in the same classes, but with probabilities 1 − θ 2
and θ 2 respectively. Random samples of sizes n1 and n2 are taken from the first and
second populations, respectively, and the observed frequencies in the A and a classes
are:
A a
st
1 sample x11 x12 x11 + x12 = n1
2nd sample x21 x22 x21 + x22 = n2
⎛n ⎞ ⎛n ⎞
p ( x11 , x12 , x21 , x22 ;θ ) = ⎜ 1 ⎟θ x12 (1 − θ ) x11 ⎜ 2 ⎟θ 2 x22 (1 − θ 2 ) x21
⎝ x12 ⎠ ⎝ x22 ⎠
⎛n ⎞ ⎛n ⎞
ln ( p (θ ) ) = ln ⎜ 1 ⎟ + x12 ln θ + x11 ln(1 − θ ) + ln ⎜ 2 ⎟ + 2 x22 ln θ + x21 ln(1 − θ 2 )
⎝ x12 ⎠ ⎝ x22 ⎠
d ln ( p (θ ) ) x x 2x 2 x θˆ
= 0 ⇒ 12 − 11 + 22 − 21 2 = 0
dθ θˆ 1 − θˆ θˆ 1 − θˆ
( x + 2 x )(1 − θˆ 2 ) − x (θˆ + θˆ 2 ) − 2 x θˆ 2 = 0
12 22 11 21
9
Confidence Intervals and Tests of Hypothesis:
9) A certain professor designed a Java program to randomly “pop” a quiz with probabili-
ty p . During the course the “popper” was used 22 times and in 6 of those a pop quiz
did take place. Assume that each usage of the popper was independent from all the
others.
a) (4 pts.) Your estimate of the probability of a quiz is ___27.3%__, give or take
___9.5%____.
b) (4 pts.) Find the 95% confidence interval for the probability p . [Hint: use the
Agresti-Coull interval estimation.]
p(1 − p) x+2
95% Agresti-Coull CI: p ± z0.025 where p = and n = n + 4
n n
8 p(1 − p)
n = 26 p= = .3077 z0.025 = 1.96 = .0905 ⇒ 95%CI: .3077 ± 1.96 × .0905 = [.130,.485].
26 n
c) (2 pts.) What sample size would be needed to obtain a 95% confidence interval
with width ±0.1 ?
d) (4 pts.) Can you say at the 10% significance level that the probability p is
different from 1/3? Clearly state your hypothesis and answer the question by
computing a P-value.
H0 : p = 1 H1 : p ≠ 1
3 3
Under H 0 : X ∼ Binom ( 22,1 3)
Thus: P-value = 2 P [ pˆ ≤ 6 / 22 | H 0 ] = 2 P [ X ≤ 6] = (using a calculator) = 2 × .362
⇒ P -value = 72.4% ⇒ cannot reject H 0 .
10
10) Consider an experiment in which half of the individuals in a group of 50 post-
menopausal overweight women were randomly assigned to a particular vegan diet,
and the other half received a diet based on the National Cholesterol Education
Program guidelines. The sample mean decrease in body weight, as well as the sample
standard deviations, for both diet subgroups are given in the following table:
a) (5 pts.) Estimate the difference between the true average weight losses for the two
diets with a 95% confidence interval.
(m − 1) s12 + (n − 1) s22
95% CI for ( μ1 − μ2 ) : ( X − Y ) ± t0.025,48 SE , SE = spool 1 1
+ , spool =
m n m+n−2
24 × 3.22 + 24 × 2.82 2
t0.025,48 = 2.011 spool = = 3.007 SE = 3.007 =0.8505
48 25
95% CI: 2 ± 2.011× 0.8505 = [ 0.288,3.710].
b) (5 pts.) Does it appear that the true average weight loss for the vegan diet exceeds
that for the control diet by more than 1 kg? Carry out an appropriate test of
hypothesis at the significance level 0.05 based on calculating a P-value.
H 0 : μ1 − μ2 ≤ 1 H1 : μ1 − μ2 > 1
( X − Y ) − Δ0 2 −1
tobs = = = 1.176
SE .8505
P -value = P [T48 ≥ tobs ] = P [T48 ≥ 1.176] = 12.3% ⇒ cannot reject H 0 .
Thus it does not appear that the weight loss of the vegan diet exceeds that
of the control diet by more than 1kg.
11
Correlation and Linear Regression:
11) The grades in the midterm exam of a certain class were not good. So the professor
decided to offer the students the chance to retake the same exam at home. He then
averaged the in-class grade with the home-retake grade in order to obtain the final
midterm grade. His claim was that the final midterm grades would be highly linearly
correlated with the in-class grades and thus this would be a fair, but less arbitrary,
way of curving the initial grades. Let x represent the in-class grades and y represent
the final midterm grades. The following R output shows many of the results of fitting
the model Y = β 0 + β1 x + ε to the 126 ( xi , yi ) data points.
a) (6 pts.) Fill in the blanks in the tables above by computing the following values:
the t-statistics associated to the estimate β̂0 , the standard error associated to the
estimate β̂1 , the coefficient of determination, the regression sum of squares, the
mean sums of squares – regression and residuals, and the value of the F statistics.
Show your computations.
j
SE ( βˆ j ) SE ( βˆ1 ) = .57602 = .01349
42.70
SSE 988.3
R2 = 1 − = 1− = 0.9363
S yy 15518.4
SS Reg = S yy − SSE = 15518.4 − 988.3 = 14530.1
SS Reg
MSSReg = = 14530.1 = 14530.1
k 1
MSSE = SSE = 988.3 = 7.9702
(n − k − 1) 124
MSS Reg
F= = 14530.1 = 1823.1
MSSE 7.9702
12
b) (2 pts.) What is the sample correlation between the in-class grades and the final
midterm grades?
r = R 2 = 0.9363 = 0.9676.
c) (4 pts.) Is there substantial evidence that the population correlation between in-
class and final midterm grades is at least 0.9? [Hint: this is a hypothesis test …]
d) (2 pts.) What were the assumptions made about the errors ε i in the linear model
Y = β 0 + β1 x + ε above?
Assumptions:
1. ε1 , ε 2 ,..., ε n are i.i.d. random variables
2. ε i ∼ N ( 0,σ 2 ) , ∀i
13
e) (2 pts.) Based on the plot of residuals versus fitted values below, do you think that
the assumptions stated in part (d) hold for the data in question? Explain.
f) (4 pts.) Suppose that you wanted to use the calibrated model above to predict the
final midterm grade corresponding to an in-class grade of 50. Find the 90%
prediction interval for the final midterm grade. Note that x = 42.913 .
1 ( x* − x ) 2 S yy r 2 S yy .96762 × 15518.4
spred = σˆ 1 + + βˆ1 = r ⇒ S xx = = = 43788.85
n S xx S xx βˆ 2
1 .576022
1 (50 − 42.913) 2
spred = 2.823 1 + + = 2.836
126 43788.85
Thus: 90% pred CI = 71.134 ± 1.657 × 2.836 = [ 66.43,75.83].
14