생체의학공학 - 05 - Statistical Anlaysis

Statistical Analysis:
ANOVA & Tukey’s Multiple Comparison



• ANOVA (Analysis of Variances)

• Student T-test

• P value
General Example
General Example in Biological Experiments
General Example in Clinical Experiments
General Example in Biological Experiments
• ANOVA (Analysis of variance) is a statistical technique to check if the means

TWO or MORE independent (unrelated) groups are significantly different from
each other

• ANOVA checks the impact of one or more factors by comparing the means of
different groups (between groups)

• Variation between groups (집단간 변이) vs Variation within groups (집단내


(1) Independence of observations (표본의 독립성)

• All samples are randomly selected and independent

(2) Normality (모집단의 정규성)

• Each population from which a sample is taken is assumed to be normal
• = the response variables are normally distributed

(3) Homogeneity of variance (모집단의 등분산)

• The populations are assumed to have equal standard deviations (or variances)

• The factor variable is a categorical variable

• The response variable is a numerical variable
ANOVA in Excel

• The null hypothesis (귀무가설) is simply that all the group population
means are the same
• The alternative hypothesis (대립가설) is that at least one pair of means is
• For example, if there are k groups:

• H0: μ1 = μ2= μ3 and the three populations have the

same distribution if the null hypothesis is true.
• The variance of the combined data is approximately
the same as the variance of each of the populations.
• H0 is true. All means are the same; the differences
are due to random variation.

• If the null hypothesis is false, then the variance of the

combined data is larger which is caused by the
different means as shown in the second graph
• H0 is not true. All means are not the same; the
differences are too large to be due to random
Two Sources of Variability

(1) Variance between groups (집단간 변이)

• Any variance due to treatments for a randomized experiment
• ex 1) experimental groups, different treatments…
• ex 2) drug 1, drug 2, drug 3, drug 4….
• ex 3) concentration Zero, conc 10, conc 100, conc 1000….

(2) Variance within groups (집단내 변이)

• Any variance by error or residual in the group
• ex 1) several data in each groups (with the variance)
Two Sources of Variability
Total Variability The Sum of Squares
Total Variability

• The F statistic is a ratio (a fraction) = MSG/MSE

• A measure of the variability between treatments (the numerator, 분자)
divided by a measure of the variability within treatments (the denominator,

• If F is large, the variability between treatments is large relative to the

variation within treatments, and we reject the null hypothesis of equal
• If F is small, the variability between treatments is small relative to the
variation within treatments, and we do not reject the null hypothesis of
equal means.
Degree of Freedom (df)

errors within samples

Mean Square (MS)

• = SS/df
• A sum of squared deviations (MSG) is divided by the appropriate number of
degrees of freedom (MSE)
• MSG: consists of the population variance plus a variance produced from the
differences between the samples
• MSE: an estimate of the population variance
Mean Square (MS)
F statistics

• = MSG/MSE (Fstat)
• If the null hypothesis is true, the F statistic has an F distribution with k − 1
and n − k degrees of freedom in the numerator/denominator respectively.
• If the alternate hypothesis is true, then F tends to be large. We reject H0 in
favor of Ha if the F statistic is sufficiently large.
F statistics

• If the null hypothesis is true, the F statistic has an F distribution with k − 1
and n − k degrees of freedom in the numerator/denominator respectively.

• If the alternate hypothesis is true, then F tends to be large. We reject H0 in

favor of Ha if the F statistic is sufficiently large.

• Since variances are always positive, if the null hypothesis is false, MSG will
generally be larger than MSE. Then the F-ratio will be larger than one
F statistics
ANOVA in Excel

• Fstat > Fcritical

• (Excel) Fcritical = FINV (alpha, dfG, dfE)

• FINV: the inverse of the F probability distribution (FDIST)
• If p = FDIST(x,...), then FINV(p,...) = x

• there is only one critical region, in the right tail (shown as the blue
shaded region above).
• If the F-statistic lands in the critical region, we can conclude that the
means are significantly different and we reject the null hypothesis.
• Again, we have to find the critical value to determine the cut-off for
the critical region

• alpha > p value

• p value
• a measure of the probability that an observed difference could have
occurred just by random chance
• the probability of obtaining test results at least as extreme as the results
actually observed, under the assumption that the null hypothesis is correct

• (Excel) p value = FDIST (Fstat, dfG, dfE)

• FDIST: the F probability distribution (degree of diversity) for two data sets

• If p value is very small, the null hypothesis is wrong (=alternative

hypothesis is selected). Therefore, the means are different!
• In general, p<0.05 or p<0.01
ANOVA in Excel
Multiple comparison (Post Hoc comparison)

• From the result of ANOVA…

• If Fstat > Fcritical and alpha > p value…
• the means between group are different! (Yes or No)
• However, there is no answer about which group’s mean value is different
from what…

• Therefore, it is necessary to compare the means in pair-wise

Multiple comparison (Post Hoc comparison)
Tukey Test

• Tukey-Kramer method
• Tukey HSD (Honestly Significant Difference)
• Statistical tool used to determine if the relationship between two sets of
data is statistically significant
• Pairwise Comparison/ Multiple comparison
• Studentized range distribution
Multiple comparison (Post Hoc comparison)
Multiple comparison in Excel
Multiple comparison (Post Hoc comparison)
In-Class Assignment
In-Class Assignment
In-Class Assignment

• There are two major parameter to design experimental groups:

• (1) incorporated HA percentage (0, 10,and 20 %),
• (2) cell seeding density (High and Low)
• = total 6 experimental groups per time point (Day 1, 4, and 8)

• △: the calibrator group

• The dashed line: the fold change of the calibrator
In-Class Assignment

• + : statistical difference in HA amount within the

same cell seeding density group compared with
the 0% HA control group
• # : statistical difference in cell seeding density
groups within the same HA concentration group
• P < 0.05

• TWO determinations/conclusions should be made ( + and #)

• In terms of +, …. (the effect of incorporated HA percent)
• In terms of #, …. (the effect of cell seeding density)

