Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

STAT 22200 Spring 2021 Homework 1A

(Due midnight on Wednesday, April 7, on Canvas)

(25 points in total) Refer to Exercise 3.1 on p.60 in Oehlert’s textbook but do the following parts instead.

1. The table below gives the mean and the SD of the response (liver weight as a percentage of body weight)
and the size for the four diet groups.

Diet 1 2 3 4
Mean y i• 3.7457 3.5800 3.5983 3.9225
SD si 0.2840 0.1821 0.0962 0.1971
Size ni 7 8 6 8

Using only the data summary above (not the raw data), calculate

(a) the SStrt ,


(b) the SSE, and
(c) create the ANOVA table. Show your work.
(d) What would you conclude about the four diets from the ANOVA F test? To find the p-value, you can
use the R command pf(F, df1, df2, lower.tail=FALSE), where F is the value of the F -statistic
and df1 , df2 are the degrees of freedom for the numerator and the denominator respectively.

Answer:

(a) [3pts = 1pt for y •• + 2pts for SStrt ] As the size of the four groups are not equal, the grand mean is
not a simple average of the four group means but
n1 y 1• + n2 y 2• + n3 y 3• + n4 y 4• 7 × 3.7457 + 8 × 3.5800 + 6 × 3.5983 + 8 × 3.9225
y •• = = ≈ 3.7183
n1 + n2 + n3 + n4 7+8+6+8

We then use this to compute the treatment sum of square SStrt :


g
X
SStrt = ni (y i• − y •• )2
i=1
= 7(3.7457 − 3.7183)2 + 8(3.5800 − 3.7183)2 + 6(3.5983 − 3.7183)2 + 8(3.9225 − 3.7183)2
≈ 0.5783

(b) [3pts]
g X
X ni g
X
2
SSE = (yij − y i• ) = (ni − 1)s2i
i=1 j=1 i=1

≈ (7 − 1)(0.2840) + (8 − 1)(0.1821)2 + (6 − 1)(0.0962)2 + (8 − 1)(0.1971)2 ≈ 1.0343


2

(c) [4pts. Please grade conditionally. If SStrt or SSE is wrong, please still given the credit for MS as
long as it is calculated as SS/df. Similarly, please give credit for F as long as it is calculated as
MStrt /MSE, even if the value of MStrt or MSE or both are wrong.] The ANOVA table is

1
Source df SS MS F -statistic

Treatment g−1 = 4−1 = 3 [1pt] 0.5783 MStrt = SS


g−1 =
trt 0.5783
3 ≈ 0.1928 [0.5pt] F = MStrt 0.1928
MSE = 0.04137 ≈ 4.66 [1pt]

Error N −g = 29−4 = 25 [1pt] 1.0343 MSE = NSSE 1.0343


−g = 25 = 0.04137 [0.5pt]

(d) [2pts = 1pt for the P -value + 1pt for the conclusion] The P -value ≈ 0.01014 is found in R as follows
> pf(4.66, 3, 25, lower.tail=FALSE)
[1] 0.01014091
In view of the small P -value, we can conclude that at least two of diets have significantly different
effects on the liver weight of rats as a percentage of their body weights.

2. The data file is available at http://users.stat.umn.edu/~gary/book/fcdae.data/ex3.1. Let’s load


the data file and create the ANOVA table by R as follows.

> rats = read.table("http://users.stat.umn.edu/~gary/book/fcdae.data/ex3.1", h=T)


> anova(lm(y ~ diet, data = rats))
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
diet 1 0.14903 0.149030 2.7493 0.1089
Residuals 27 1.46358 0.054207

The ANOVA table shows the df for variable diet is 1 instead of g − 1 = 4 − 1 = 3. This is because “diet”
was coded as 1, 2, 3, 4 in the data file R hence regards diet as a numerical variable and fits a linear
regression model that y changes linearly with the diet.
y = β0 + β1 diet + ε
which has two parameters β0 and β1 only. The variable diet is in fact categorical and we we want R to
fit a model with separate means for each category
yij = µi + εij
which has four parameters µ1 , µ2 , µ3 , and µ4 .
We can use the command as.factor() to let R know that diet is categorical.

anova(lm(y ~ as.factor(diet), data = rats))

Run the R codes above and see if it matches the ANOVA table you created in part (a).

Answer: [1pt]

> anova(lm(y ~ as.factor(diet), data = rats))


Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(diet) 3 0.57821 0.192736 4.6581 0.01016 *
Residuals 25 1.03440 0.041376

2
3. Construct a 95% confidence interval for the mean percentage of body weight that is liver weight for rats
given the third diet µ3 . Show your work.

Answer: [3pts] The 95% CI for µ3 is


1pt
√ √
z }| {
MSE 0.041376
y 1• ± tα/2,N −g √ ≈ 3.5983 ± |{z}
2.06 √ ≈ 3.5983 ± 0.171 = (3.4273, 3.7693)
ni 6
1pt |{z}
1pt

where he critical value tα/2,df =N −g = t0.05/2,df =29−4=25 ≈ 2.06 is found in R as follows:

> qt(0.05/2, df = 25, lower.tail=FALSE)


[1] 2.059539

[Give 0 pt if a one-sample CI is computed using only data in the first group as follows.]
si 0.0962
y 1• ± tα/2,ni −1 √ ≈ 3.5983 ± 2.5706 √ ≈ 3.5983 ± 0.1001 = (3.4982, 3.6984) ←− [Wrong!]
ni 6

> qt(0.05/2, df = 6-1, lower.tail=FALSE)


[1] 2.570582

4. Make all pairwise comparisons of all 4 diets by t-tests. To present your answers, first create a table
with columns for: (1) the diets compared, (2) the mean differences, (3) the standard errors, and (4) the
t-statistics.

diet pair mean diff. Standard Error (SE) t-value


2-1
3-1
4-1
3-2
4-2
4-3

Finally, determine which pairs of diets are significantly at 5% significance level. Create an “underline
diagram” to summarize the result.

Answer: [Please grade conditionally. Please give full mark for the t-values as long as they are calculated
as (mean diff )/SE even if the SEs are incorrect. Similarly, please grade the significant pairs and the
underline diagrams conditionally based on the t-values. Do not penalize the same mistake twice.]
From the ANOVA table in part (a) or (b), we see the MSE is 0.041376 with 25 degrees of freedom.

3
Diet Pair Estimate of Difference Standard
q Error (SE) t-value
i-k y i• − y k• MSE( n1i + n1k ) (y i• − y k• )/SE
[3pts] [2pts]
q
2-1 3.5800 − 3.7457 = −0.1657 0.041376( 81 + 17 ) = 0.10528 −1.574
q
3-1 3.5983 − 3.7457 = −0.1474 0.041376( 61 + 17 ) = 0.11317 −1.302
q
4-1 3.9225 − 3.7457 = 0.1768 0.041376( 81 + 17 ) = 0.10528 1.679
q
3-2 3.5983 − 3.5800 = 0.0183 0.041376( 61 + 18 ) = 0.10985 0.167
q
4-2 3.9225 − 3.5800 = 0.3425 0.041376( 81 + 18 ) = 0.10171 3.368
q
4-3 3.9225 − 3.5983 = 0.3242 0.041376( 81 + 16 ) = 0.10985 2.951

[2pts for identifying the significant pairs] The critical value at 5% level tα/2,df =N −g = t0.05/2,df =29−4=25
can be found in R as follows:

> qt(0.05/2, df = 25, lower.tail=FALSE)


[1] 2.059539

We see only the pairs 4-2 and 4-3 are significantly different at 5% level as only their t-statistics 3.368 and
2.951 exceed the critical value 2.059539.
Alternatively, one can also calculated the two-sided P -values for the 6 t-statistics above in R as follows.

Diet Pair 2-1 3-1 4-1 3-2 4-2 4-3


P -value 0.128 0.205 0.106 0.869 0.00245 0.0068

> t = c(-1.574, -1.302, 1.679, 0.167, 3.368, 2.951)


> 2*pt(abs(t), df = 25, lower.tail=FALSE)
[1] 0.128059004 0.204785554 0.105609630 0.868713498 0.002454652 0.006790430

We can also see that the pairs 4-2 and 4-3 have P-values below 0.05.
[2pts for the underline diagram] To make the underline diagram, first we order the four diets by mean
from low to high.

2 3 1 4
3.5800 3.5983 3.7457 3.9225

As diet 2 is not significantly different from diet 3 or 1, but is significantly different diet 4, we draw a line
underneath diet 2, 3 and 1 but not 4. As diet 3 and 1 are not significantly different, but diet 3 and 4 are,
we will draw a line from diet 3 to 1 but not 4. However, as this line is included in the line from 2, 3 to 1,
the line from 3 to 1 can be omitted. Finally, we draw a line from diet 1 to 4 as they are not significantly
different.

2 3 1 4
3.5800 3.5983 3.7457 3.9225

You might also like