Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Exercise 8 – MICRO 110 Spring 2022

Today's exercise is about ANOVA (analysis of variance). We use ANOVA when we have more than 2
means to compare. Module 2, slides 50-61 cover ANOVA in detail. The goal of today's exercise is to
perform an ANOVA calculation without using a function and then compare your results to the built in
ANOVA function in the scipy.stats library. I have labelled the variable names for you to easily follow
along with the ANOVA lecture slides.

We first calculate the D (deviations from grand average), T (treatment deviations), and R (residuals
within treatment deviations) for each paint supplier.

Second, we calculate the sum of squares for D, T, and R. We call these values SD, ST, SR.

VD, VT, and VR are the degrees of freedom for D (deviations from grand average), T (treatment
deviations), and R (residuals within treatment deviations) for each paint supplier.

MT and MR are the mean square values calculated from the sum of squares and degrees of freedom.

Finally, the F ratio is the ratio between the mean square values.

We use the F ratio table (or in our case the cumulative distribution of the f ratio table) to calculate the p
value of the ANOVA test in the same way that we used the T ratio table for t test pvalues.

https://www.socscistatistics.com/pvalues/fdistribution.aspx

Once we've put in the hard work of calculating and understanding the deviations, residuals, sum of
squares, degrees of freedom, mean squares, f ratio, and p value; we simply run the one line built in
function and see how powerful python can be in reducing the effort required to do statistics.

scipy.stats.f_oneway() <---- this line of code does the entire exercise for you.

1) Paint used for marking lanes on highways must be very durable. In one trial, paint from four

different suppliers, labeled GS, FD, L, and ZK, were tested on six different highway sites, denoted

1, 2, 3, 4, 5, 6. After a considerable length of time, which included different levels of traffic and
weather, the average wear for the samples at the six sites was as follows:

a) Perform an Anova calculation by hand, as follows (It may be helpful to follow along on module 2 slides
56-59):

i) Determine

ii) Determine the values of:

iii) Complete the ANOVA table

iv) Calculate the ANOVA coefficients

(1) SD:

(2) υD:

(3) ST:
(4) υT:

(5) SR:

(6) υR:

v) Determine the F value:

(1) mT:

(2) mR:

(3) F:

vi) calculate the pvalue using python or another statistical tool

b) Now, use Python or another statistical tool to compare every pair of supplies and provide P values:

i) GS vs FD:

ii) GS vs L:

iii) GS vs ZK:

iv) FD vs L:

v) FD vs ZK:

vi) ZK vs L:

c) Which supplier is statistically the best (i.e., has the least wear)?

d) Which supplier is the worst?

You might also like