Professional Documents
Culture Documents
STAT 22200 Spring 2021 Homework 1A
STAT 22200 Spring 2021 Homework 1A
(25 points in total) Refer to Exercise 3.1 on p.60 in Oehlert’s textbook but do the following parts instead.
1. The table below gives the mean and the SD of the response (liver weight as a percentage of body weight)
and the size for the four diet groups.
Diet 1 2 3 4
Mean y i• 3.7457 3.5800 3.5983 3.9225
SD si 0.2840 0.1821 0.0962 0.1971
Size ni 7 8 6 8
Using only the data summary above (not the raw data), calculate
Answer:
(a) [3pts = 1pt for y •• + 2pts for SStrt ] As the size of the four groups are not equal, the grand mean is
not a simple average of the four group means but
n1 y 1• + n2 y 2• + n3 y 3• + n4 y 4• 7 × 3.7457 + 8 × 3.5800 + 6 × 3.5983 + 8 × 3.9225
y •• = = ≈ 3.7183
n1 + n2 + n3 + n4 7+8+6+8
(b) [3pts]
g X
X ni g
X
2
SSE = (yij − y i• ) = (ni − 1)s2i
i=1 j=1 i=1
(c) [4pts. Please grade conditionally. If SStrt or SSE is wrong, please still given the credit for MS as
long as it is calculated as SS/df. Similarly, please give credit for F as long as it is calculated as
MStrt /MSE, even if the value of MStrt or MSE or both are wrong.] The ANOVA table is
1
Source df SS MS F -statistic
(d) [2pts = 1pt for the P -value + 1pt for the conclusion] The P -value ≈ 0.01014 is found in R as follows
> pf(4.66, 3, 25, lower.tail=FALSE)
[1] 0.01014091
In view of the small P -value, we can conclude that at least two of diets have significantly different
effects on the liver weight of rats as a percentage of their body weights.
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
diet 1 0.14903 0.149030 2.7493 0.1089
Residuals 27 1.46358 0.054207
The ANOVA table shows the df for variable diet is 1 instead of g − 1 = 4 − 1 = 3. This is because “diet”
was coded as 1, 2, 3, 4 in the data file R hence regards diet as a numerical variable and fits a linear
regression model that y changes linearly with the diet.
y = β0 + β1 diet + ε
which has two parameters β0 and β1 only. The variable diet is in fact categorical and we we want R to
fit a model with separate means for each category
yij = µi + εij
which has four parameters µ1 , µ2 , µ3 , and µ4 .
We can use the command as.factor() to let R know that diet is categorical.
Run the R codes above and see if it matches the ANOVA table you created in part (a).
Answer: [1pt]
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(diet) 3 0.57821 0.192736 4.6581 0.01016 *
Residuals 25 1.03440 0.041376
2
3. Construct a 95% confidence interval for the mean percentage of body weight that is liver weight for rats
given the third diet µ3 . Show your work.
[Give 0 pt if a one-sample CI is computed using only data in the first group as follows.]
si 0.0962
y 1• ± tα/2,ni −1 √ ≈ 3.5983 ± 2.5706 √ ≈ 3.5983 ± 0.1001 = (3.4982, 3.6984) ←− [Wrong!]
ni 6
4. Make all pairwise comparisons of all 4 diets by t-tests. To present your answers, first create a table
with columns for: (1) the diets compared, (2) the mean differences, (3) the standard errors, and (4) the
t-statistics.
Finally, determine which pairs of diets are significantly at 5% significance level. Create an “underline
diagram” to summarize the result.
Answer: [Please grade conditionally. Please give full mark for the t-values as long as they are calculated
as (mean diff )/SE even if the SEs are incorrect. Similarly, please grade the significant pairs and the
underline diagrams conditionally based on the t-values. Do not penalize the same mistake twice.]
From the ANOVA table in part (a) or (b), we see the MSE is 0.041376 with 25 degrees of freedom.
3
Diet Pair Estimate of Difference Standard
q Error (SE) t-value
i-k y i• − y k• MSE( n1i + n1k ) (y i• − y k• )/SE
[3pts] [2pts]
q
2-1 3.5800 − 3.7457 = −0.1657 0.041376( 81 + 17 ) = 0.10528 −1.574
q
3-1 3.5983 − 3.7457 = −0.1474 0.041376( 61 + 17 ) = 0.11317 −1.302
q
4-1 3.9225 − 3.7457 = 0.1768 0.041376( 81 + 17 ) = 0.10528 1.679
q
3-2 3.5983 − 3.5800 = 0.0183 0.041376( 61 + 18 ) = 0.10985 0.167
q
4-2 3.9225 − 3.5800 = 0.3425 0.041376( 81 + 18 ) = 0.10171 3.368
q
4-3 3.9225 − 3.5983 = 0.3242 0.041376( 81 + 16 ) = 0.10985 2.951
[2pts for identifying the significant pairs] The critical value at 5% level tα/2,df =N −g = t0.05/2,df =29−4=25
can be found in R as follows:
We see only the pairs 4-2 and 4-3 are significantly different at 5% level as only their t-statistics 3.368 and
2.951 exceed the critical value 2.059539.
Alternatively, one can also calculated the two-sided P -values for the 6 t-statistics above in R as follows.
We can also see that the pairs 4-2 and 4-3 have P-values below 0.05.
[2pts for the underline diagram] To make the underline diagram, first we order the four diets by mean
from low to high.
2 3 1 4
3.5800 3.5983 3.7457 3.9225
As diet 2 is not significantly different from diet 3 or 1, but is significantly different diet 4, we draw a line
underneath diet 2, 3 and 1 but not 4. As diet 3 and 1 are not significantly different, but diet 3 and 4 are,
we will draw a line from diet 3 to 1 but not 4. However, as this line is included in the line from 2, 3 to 1,
the line from 3 to 1 can be omitted. Finally, we draw a line from diet 1 to 4 as they are not significantly
different.
2 3 1 4
3.5800 3.5983 3.7457 3.9225