Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 32

University of Guyana

Faculty of Natural Sciences

Department of Biology

BIO3211- Biometry and Biostatistics

Ms. Harris

Worksheet #03

[ANOVA, MANOVA, Kruskal-Wallis Tests]

Name/s and USI:

Jessica Seegobin (1041018)

Ron Thomas (1040804)

Date Due: April 24th, 2024


Questions:

TASK A

Null Hypothesis (H0): The crop yields of treatments A, B, and C do not differ

significantly from one another.

Alternative Hypothesis (H1): The crop yields of treatments A, B, and C differ

significantly from one another.

The attached data set called “crop yield data” is for the following questions. The premise is that 3

different fungicide treatments (A,B and C) were applied to a given crop over a number of years

and both the crop yield and the fungus density per block id, were recorded.

1. Determine if there is a significant difference in crop yield based on the treatments. If the

null hypothesis is rejected, which treatment (s) account for the difference (s).

CYD = Crop_yield_data

summary(CYD)

Fungus density block Fungicide yield

Min. :1.0 Min. :1.00 Length:96 Min. :175.4

1st Qu.:1.0 1st Qu.:1.75 Class: character 1st Qu.:176.5

Median :1.5 Median :2.50 Mode : character Median :177.1

Mean :1.5 Mean :2.50 Mean :177.0

3rd Qu.:2.0 3rd Qu.:3.25 3rd Qu.:177.4

Max. :2.0 Max. :4.00 Max. :179.1


hist(CYD$`Fungus density`)

hist(CYD$block)
hist(CYD$yield)

shapiro.test(CYD$`Fungus density`)

data: CYD$`Fungus density`

W = 0.63641, p-value = 4.514e-14

shapiro.test(CYD$block)

data: CYD$block

W = 0.85581, p-value = 3.201e-08

shapiro.test(CYD$yield)

data: CYD$yield

W = 0.98898, p-value = 0.6123


summary(CYD$`Fungus density`)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.0 1.0 1.5 1.5 2.0 2.0

summary(CYD$block)

Min. 1st Qu. Median Mean 3rd Qu. Max.

175.4 176.5 177.1 177.0 177.4 179.1

summary(CYD$yield)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.00 1.75 2.50 2.50 3.25 4.00

y.one.way = aov(yield~Fungicide, data = CYD)

y.one.way

Call:

aov(formula = yield ~ Fungicide, data = CYD)

Terms:

Fungicide Residuals

Sum of Squares 6.06777 35.88754

Deg. of Freedom 2 93

Residual standard error: 0.6211984

Estimated effects may be unbalanced


summary(y.one.way)

Df Sum Sq Mean Sq F value Pr(>F)

Fungicide 2 6.07 3.0339 7.862 7e-04 ***

Residuals 93 35.89 0.3859

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

TukeyHSD(y.one.way)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = yield ~ Fungicide, data = CYD)

$Fungicide

diff lwr upr p adj

B-A 0.1762187 -0.1936759 0.5461134 0.4952879

C-A 0.5991250 0.2292304 0.9690196 0.0006127

C-B 0.4229063 0.0530116 0.7928009 0.0208947

Conclusion:

With a p-value of 7e-4, a test statistic of 7.862, and two degrees of freedom, we will reject the

null hypothesis, which states that there is no significant variation in crop output dependent on the

fungicide, based on the data. Furthermore, given their strong adjusted p-value of 0.0006127,
which indicates that they are the most likely reasons for the variation, treatments A and C are the

ones that, according to the data, account for the difference.

2. Determine if there is a significant difference in crop yield based on the treatments and

fungus density. If the null hypothesis is rejected, which treatment (s) account for the

difference (s).

y.two.way = aov(yield~Fungicide+`Fungus density`, data = CYD)

y.two.way

Call:

aov(formula = yield ~ Fungicide + `Fungus density`, data = CYD)

Terms:

Fungicide `Fungus density` Residuals

Sum of Squares 6.067771 5.122656 30.764881

Deg. of Freedom 2 1 92

Residual standard error: 0.578274

Estimated effects may be unbalanced

summary(y.two.way)

Df Sum Sq Mean Sq F value Pr(>F)

Fungicide 2 6.068 3.034 9.073 0.000253 ***

`Fungus density` 1 5.123 5.123 15.319 0.000174 ***


Residuals 92 30.765 0.334

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion:

With p-values of 0.000253 for fungicides and 0.000174 for fungus density, test statistics of 9.073

and 15.319 for fungicides and fungus density, respectively, and degrees of freedom of 2 and 1,

we will reject the null hypothesis in this scenario, which states that there is no significant

difference in crop yield based on fungus density and fungicide use.

3. Are there any interactions between treatment (vs fungus density), vs(year) or vs(block)?

Is there a significant difference in crop yield based on the treatments and interactions? If

the null hypothesis is rejected, which treatment (s) and interactions account for the

difference (s).

intdens = aov(CYD$yield~CYD$Fungicide*CYD$`Fungus density`)

summary(intdens)

Df Sum Sq Mean Sq F Value Pr(>F)

CYD$Fungicide 2 6.068 3.034 9.001 0.000273 ***

CYD$`Fungus density` 1 5.123 5.123 15.197 0.000186 ***

CYD$Fungicide:CYD$`Fungus density` 2 0.428 0.214 0.635 0.532471

Residuals 90 30.337 0.337

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

intblock = aov(CYD$yield~CYD$Fungicide*CYD$block)
summary(intblock)

Df Sum Sq Mean Sq F Value Pr(>F)

CYD$Fungicide 2 6.07 3.0339 7.656 0.00085 ***

CYD$block 1 0.15 0.1508 0.381 0.53887

CYD$Fungicide:CYD$block 2 0.07 0.0356 0.090 0.91407

Residuals 90 35.67 0.3963

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion:

The fungicide, the block, and the density of the fungus interact with one another. With a p-value

of 0.532471, a test statistic of 0.635, and two degrees of freedom, we accept the null hypothesis

about the interactions between fungicide and fungal density, which states that these interactions

do not significantly affect the overall crop yield. We will accept the null hypothesis, which states

that there is no significant difference between the interactions between fungicide and the block

where crop yield is grown and the overall crop yield, with a p-value of 0.91407, a test statistic of

0.090, and two degrees of freedom.

4. Does blocking have an effect on whether there is a significant difference in crop yield

based on the treatments and the other factors?

y.block = aov(yield~Fungicide+block, data = CYD)

y.block

Call:

aov(formula = yield ~ Fungicide + block, data = CYD)


Terms:

Fungicide block Residuals

Sum of Squares 6.06777 0.15080 35.73673

Deg. of Freedom 2 1 92

Residual standard error: 0.6232517

Estimated effects may be unbalanced

summary(y.block)

Df Sum Sq Mean Sq F value Pr(>F)

Fungicide 2 6.07 3.0339 7.810 0.000736 ***

block 1 0.15 0.1508 0.388 0.534774

Residuals 92 35.74 0.3884

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation:

We will accept the null hypothesis, which has a p-value of 0.534774, a test statistic of 0.388, and

a degree of freedom of 1, based on the data, that blocking has no influence on the substantial

differences in crop output based on treatments.

5. Which model (s) best describes the crop yield?

- The crop yield follows a non-parametric model.


TASK B

The attached data set called “Bacterial growth” is for the following questions. An experiment in

which the optimal conditions for growth and product formation were determined for a bacterial

strain in a broth with a certain carbon source. Two different nitrogen sources were evaluated

(yeast extract and ammonium chloride) and three different incubation temperatures (30, 35 and

37◦C). Bacterial growth was evaluated after 24 hours using dry cell weight (in mg/ml) and

optical density at 600 nm and the yield of a desired fermentation product was determined using

gas chromatography and expressed in mM.

1. Test whether there is a statistical difference in Optical Density and Product Yield

between Temperature as a factor. If the null hypothesis is rejected, which variable(s)

account for the difference (s).

bg = bacterial_growth

summary(bacterial_growth)

Experiment Temperature N-source Replica Dry weight

Length:120 Min. :30 Length:120 Min. : 1.00 Min. : 1.670

Class :character 1st Qu.:30 Class :character 1st Qu.: 5.75 1st Qu.: 4.805

Mode :character Median :35 Mode :character Median :10.50 Median : 6.070

Mean :34 Mean :10.50 Mean : 6.024

3rd Qu.:37 3rd Qu.:15.25 3rd Qu.: 7.080

Max. :37 Max. :20.00 Max. :10.300

Optical density Product yield

Min. :0.210 Min. : 8.40

1st Qu.:1.770 1st Qu.:41.90


Median :1.910 Median :57.65

Mean :1.976 Mean :57.10

3rd Qu.:2.223 3rd Qu.:72.70

Max. :2.480 Max. :93.10

hist(bg$`Optical density`)

hist(bg$Temperature)
hist(bg$`Product yield`)

hist(bg$`Dry weight`)

hist(bg$Replica)
shapiro.test(bg$`Optical density`)

data: bg$`Optical density`

W = 0.86969, p-value = 7.32e-09

shapiro.test(bg$Temperature)

data: bg$Temperature

W = 0.74714, p-value = 4.311e-13

shapiro.test(bg$`Product yield`)

data: bg$`Product yield`

W = 0.97032, p-value = 0.009352

shapiro.test(bg$`Dry weight`)

data: bg$`Dry weight`


W = 0.99331, p-value = 0.8383

OP.man.temp = manova(cbind(bg$`Optical density`, bg$`Product yield`)~bg$Temperature)

OP.man.temp

Call:

manova(cbind(bg$`Optical density`, bg$`Product yield`) ~ bg$Temperature)

Terms:

bg$Temperature Residuals

resp 1 0.16 10.72

resp 2 38.19 44701.36

Deg. of Freedom 1 118

Residual standard errors: 0.301477 19.46343

Estimated effects may be unbalanced

summary(OP.man.temp)

Df Pillai approx F num Df den Df Pr(>F)

bg$Temperature 1 0.022243 1.3309 2 117 0.2682

Residuals 118

2. Test whether there is a statistical difference in Optical Density and Product Yield

between Nitrogen Sources. If the null hypothesis is rejected, which variable (s) account

for the difference (s).


OP.man.source = manova(cbind(bg$`Optical density`, bg$`Product yield`)~bg$`N-source`)

OP.man.source

Call:

manova(cbind(bg$`Optical density`, bg$`Product yield`) ~ bg$`N-source`)

Terms:

bg$`N-source` Residuals

resp 1 6.707 4.182

resp 2 27579.07 17160.49

Deg. of Freedom 1 118

Residual standard errors: 0.1882613 12.05935

Estimated effects may be unbalanced

summary(OP.man.source)

Df Pillai approx F num Df den Df Pr(>F)

bg$`N-source` 1 0.71779 148.79 2 117 < 2.2e-16 ***

Residuals 118

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

3. Determine whether there is a statistical difference in Dry weight, Optical Density and

Product Yield between Nitrogen Sources and Temperature as a Factor. If the null
hypothesis is rejected, which variable (s) account for the difference (s).

DOP.man = manova(cbind(bg$`Optical density`,bg$`Dry weight`, bg$`Product yield`)~bg$`N-

source`*bg$Temperature)

DOP.man

Call:

manova(cbind(bg$`Optical density`, bg$`Dry weight`, bg$`Product yield`) ~

bg$`N-source` * bg$Temperature)

Terms:

bg$`N-source` bg$Temperature bg$`N-source`:bg$Temperature Residuals

resp 1 6.707 0.165 0.165 3.853

resp 2 8.791 1.232 0.978 294.713

resp 3 27579.072 38.193 305.753 16816.539

Deg. of Freedom 1 1 1 116

Residual standard errors: 0.1822555 1.593934 12.04036

Estimated effects may be unbalanced

summary(DOP.man)

Df Pillai approx F num Df den Df Pr(>F)

bg$`N-source` 1 0.73205 103.817 3 114 < 2e-16 ***

bg$Temperature 1 0.04171 1.654 3 114 0.18095

bg$`N-source`:bg$Temperature 1 0.05481 2.204 3 114 0.09147 .

Residuals 116
---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion:

Assuming the first situation, the null hypothesis (p-value of 0.2682, test statistic of around

0.022243, degrees of freedom at 1 and 2, respectively) indicates that there is no statistically

significant difference in Optical Density and Product Yield with Temperature as a factor. In

contrast, the null hypothesis in the second scenario—which claims that there is no statistically

significant difference in Product Yield and Optical Density with the nitrogen source as a factor—

will be rejected. It has a p-value of 2.2e-16, a test statistic of roughly 0.71779, and degrees of

freedom at 1 and 2, respectively. With a p-value of 0.09147, a test statistic of 2.204, and degrees

of freedom at 1 and 3, the null hypothesis, which claims that there is no statistical difference in

Optical Density, Dry Weight, and Product Yield with Temperature and Nitrogen Sources as

factors, will finally be accepted in the last scenario.

TASK C

Treatment and weight results of beetles.

Weights

Treatment 1 (52,46,62,48,57,54)

Treatment 2 (66,49,64,53,68)

Treatment 3 (63,65,58,70,71,73)
Hypothesis

Null hypothesis: There isn't a discernible variation in the insects' weights.

Alternate hypothesis: The insects' weights differ significantly from one another.

Analysis of Variance Table

Response: weights

Df Sum Sq Mean Sq F value Pr(>F)

treatments 1 1166.9 1166.9 4.3082e+32 < 2.2e-16 ***

Residuals 15 0.0 0.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R Scripts

#weights of beetles

treatment1<-c(52,46,62,48,57,54)

treatment2<-c (66,49,64,53,68)

treatment3<-c(63,65,58,70,71,73)

treatments<-c(treatment1, treatment2, treatment3)

weights<-c(treatment1, treatment2, treatment3)


beetle_weights<-data.frame(treatments, weights)

anova(lm(weights~treatments, data=beetle_weights))

summary(beetle_weights)

hist(treatments, weights)

shapiro.test(treatments)

> #weights of beetles

> treatment1<-c(52,46,62,48,572,54)

> Treatment2<-c (66,49,64,53,68)

> Treatment3<-c(63,65,58,70,71,73)

> #weights of beetles

> treatment1<-c(52,46,62,48,57,54)
> treatment2<-c (66,49,64,53,68)

> treatment3<-c(63,65,58,70,71,73)

> treatments<-c(treatment1, treatment2, treatment3)

> weights<-c(treatment1, treatment2, treatment3)

> beetle_weights<-data.frame(treatments, weights)

> anova(lm(weights~treatments, data=beetle_weights))

Analysis of Variance Table


Response: weights

Df Sum Sq Mean Sq F value Pr(>F)

treatments 1 1166.9 1166.9 4.3082e+32 < 2.2e-16 ***

Residuals 15 0.0 0.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘

’1
> summary(beetle_weights)

treatments weights

Min. :46.00 Min. :46.00

1st Qu.:53.00 1st Qu.:53.00

Median :62.00 Median :62.00

Mean :59.94 Mean :59.94


3rd Qu.:66.00 3rd Qu.:66.00

Max. :73.00 Max. :73.00

> hist(treatments, weights)

> shapiro.test(treatments)

Shapiro-Wilk normality test


data: treatments

W = 0.95264, p-value = 0.4996

>
Conclusion:

To look for variations in the beetle weights, the Anova test was employed. This test manages the
Type I error while enabling the comparison of groups at the same time. The Anova test yielded a

p value of 2.2e-16. Additionally, a Shapiro test was performed to verify normalcy. This test

yielded a p value of 0.4996. The alternative hypothesis asserts that there is a substantial

difference in the beetles' weights, contrary to the null hypothesis, which claims there is no

significant difference. We reject the null hypothesis since the normality test result (0.4996) and

the Anova's p value (2.2e-16) are both below than the predetermined significance level (0.05).

TASK D

Results for three plant extract treatment (extracts 1-3) and biomass results obtained from an

experiment on bacteria.

Biomass

Extract1 (64, 66, 68, 75, 78, 94, 98, 79, 71, 80)

Extract2 (91, 92, 93, 85, 87, 84, 82, 88, 95, 96)

Extract3 (79, 78, 88, 94, 92, 85, 83, 85, 82, 81)

Hypothesis

Null hypothesis: The average biomass of bacteria treated with the three extracts does not differ

significantly from one another.

Alternate hypothesis: The average biomass of the bacteria treated with each of the three extracts

differs significantly from one another.


Analysis of Variance Table

Response: Extracts

Df Sum Sq Mean Sq F value Pr(>F)

Biomass 1 2347.4 2347.4 6.8073e+32 < 2.2e-16 ***

Residuals 28 0.0 0.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R Scripts

#results for plant extract treatment (extracts 1-3).

Extract1<-c(64, 66, 68, 75, 78, 94, 98, 79, 71, 80)

Extract2<-c(91, 92, 93, 85, 87, 84, 82, 88, 95, 96)

Extract3<-c(79, 78, 88, 94, 92, 85, 83, 85, 82, 81)

Extracts<-c(Extract1, Extract2, Extract3)

Biomass<-c(Extract1, Extract2, Extract3)

Biomass_bacteria<-data.frame(Extracts, Biomass)

anova(lm(Extracts~Biomass, data=Biomass_bacteria))
summary(Biomass_bacteria)

plot(Biomass)

shapiro.test(Biomass)

#results for plant extract treatment (extracts 1-3).

> Extract1<-c(64, 66, 68, 75, 78, 94, 98, 79, 71, 80)

> Extract2<-c(91, 92, 93, 85, 87, 84, 82, 88, 95, 96)

> Extract3<-c(79, 78, 88, 94, 92, 85, 83, 85, 82, 81)

> Extracts<-(Extract1, Extract2, Extract3)

> Extracts<-c(Extract1, Extract2, Extract3)

> Biomass<-c(Extract1, Extract2, Extract3)

> Biomass_bacteria<-data.frame(Extracts, Biomass)

> anova(lm(Extracts~Biomass, data=Biomass_bacteria))

Analysis of Variance Table

Response: Extracts

Df Sum Sq Mean Sq F value Pr(>F)

Biomass 1 2347.4 2347.4 6.8073e+32 < 2.2e-16 ***

Residuals 28 0.0 0.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Warning message:

In anova.lm(lm(Extracts ~ Biomass, data = Biomass_bacteria)) :

ANOVA F-tests on an essentially perfect fit are unreliable

> summary(Biomass_bacteria)

Extracts Biomass
Min. :64.00 Min. :64.00

1st Qu.:79.00 1st Qu.:79.00

Median :84.50 Median :84.50

Mean :83.77 Mean :83.77

3rd Qu.:91.75 3rd Qu.:91.75

Max. :98.00 Max. :98.00

> plot(Biomass)

> shapiro.test(Biomass)

Shapiro-Wilk normality test

data: Biomass

W = 0.95916, p-value = 0.2948


Conclusion: An analysis of variances between the plant treatments and biomass data was

conducted using the Anova test. 2.2e-16 was the p-value obtained from the Anova test. Another

tool used to verify normalcy was the Shapiro test. P = 0.2948 was the result of this test. The null

hypothesis is rejected since the p values are less than the predetermined significance level of

0.05.

You might also like