생체의학공학 - 05 - Statistical Anlaysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

HUMAN MEDICAL BIOENGINEERING (Fall 2022)

Statistical Analysis:
ANOVA & Tukey’s Multiple Comparison

Dr. KYOBUM KIM

DONGGUK UNIVERSITY
DEPARTMENT OF CHEMICAL & BIOCHEMICAL ENGINEERING
Subjects

• ANOVA (Analysis of Variances)

• Student T-test

• P value
General Example
General Example
General Example in Biological Experiments
General Example in Clinical Experiments
General Example
General Example in Biological Experiments
General Example
ANOVA

• ANOVA (Analysis of variance) is a statistical technique to check if the means


TWO or MORE independent (unrelated) groups are significantly different from
each other

• ANOVA checks the impact of one or more factors by comparing the means of
different groups (between groups)

• Variation between groups (집단간 변이) vs Variation within groups (집단내


변이)
ANOVA
Assumptions

(1) Independence of observations (표본의 독립성)


• All samples are randomly selected and independent

(2) Normality (모집단의 정규성)


• Each population from which a sample is taken is assumed to be normal
• = the response variables are normally distributed

(3) Homogeneity of variance (모집단의 등분산)


• The populations are assumed to have equal standard deviations (or variances)

• The factor variable is a categorical variable


• The response variable is a numerical variable
Assumptions
ANOVA in Excel
Hypothesis

• The null hypothesis (귀무가설) is simply that all the group population
means are the same
• The alternative hypothesis (대립가설) is that at least one pair of means is
different
• For example, if there are k groups:
Hypothesis

• H0: μ1 = μ2= μ3 and the three populations have the


same distribution if the null hypothesis is true.
• The variance of the combined data is approximately
the same as the variance of each of the populations.
• H0 is true. All means are the same; the differences
are due to random variation.

• If the null hypothesis is false, then the variance of the


combined data is larger which is caused by the
different means as shown in the second graph
• H0 is not true. All means are not the same; the
differences are too large to be due to random
variation.
Hypothesis
Notation
Two Sources of Variability

(1) Variance between groups (집단간 변이)


• Any variance due to treatments for a randomized experiment
• ex 1) experimental groups, different treatments…
• ex 2) drug 1, drug 2, drug 3, drug 4….
• ex 3) concentration Zero, conc 10, conc 100, conc 1000….

(2) Variance within groups (집단내 변이)


• Any variance by error or residual in the group
• ex 1) several data in each groups (with the variance)
Two Sources of Variability
Total Variability The Sum of Squares
Total Variability
ANOVA Model

• The F statistic is a ratio (a fraction) = MSG/MSE


• A measure of the variability between treatments (the numerator, 분자)
divided by a measure of the variability within treatments (the denominator,
분모)

• If F is large, the variability between treatments is large relative to the


variation within treatments, and we reject the null hypothesis of equal
means.
• If F is small, the variability between treatments is small relative to the
variation within treatments, and we do not reject the null hypothesis of
equal means.
ANOVA Model
ANOVA Model
Degree of Freedom (df)

errors within samples


Mean Square (MS)

• = SS/df
• A sum of squared deviations (MSG) is divided by the appropriate number of
degrees of freedom (MSE)
• MSG: consists of the population variance plus a variance produced from the
differences between the samples
• MSE: an estimate of the population variance
Mean Square (MS)
F statistics

• = MSG/MSE (Fstat)
• If the null hypothesis is true, the F statistic has an F distribution with k − 1
and n − k degrees of freedom in the numerator/denominator respectively.
• If the alternate hypothesis is true, then F tends to be large. We reject H0 in
favor of Ha if the F statistic is sufficiently large.
F statistics

• = MSG/MSE
• If the null hypothesis is true, the F statistic has an F distribution with k − 1
and n − k degrees of freedom in the numerator/denominator respectively.

• If the alternate hypothesis is true, then F tends to be large. We reject H0 in


favor of Ha if the F statistic is sufficiently large.

• Since variances are always positive, if the null hypothesis is false, MSG will
generally be larger than MSE. Then the F-ratio will be larger than one
F statistics
ANOVA in Excel
DECISION in ANOVA

• Fstat > Fcritical

• (Excel) Fcritical = FINV (alpha, dfG, dfE)


• FINV: the inverse of the F probability distribution (FDIST)
• If p = FDIST(x,...), then FINV(p,...) = x
DECISION in ANOVA

• there is only one critical region, in the right tail (shown as the blue
shaded region above).
• If the F-statistic lands in the critical region, we can conclude that the
means are significantly different and we reject the null hypothesis.
• Again, we have to find the critical value to determine the cut-off for
the critical region
DECISION in ANOVA

• alpha > p value

• p value
• a measure of the probability that an observed difference could have
occurred just by random chance
• the probability of obtaining test results at least as extreme as the results
actually observed, under the assumption that the null hypothesis is correct
DECISION in ANOVA

• (Excel) p value = FDIST (Fstat, dfG, dfE)


• FDIST: the F probability distribution (degree of diversity) for two data sets

• If p value is very small, the null hypothesis is wrong (=alternative


hypothesis is selected). Therefore, the means are different!
• In general, p<0.05 or p<0.01
DECISION in ANOVA
ANOVA in Excel
Example
Example
Multiple comparison (Post Hoc comparison)

• From the result of ANOVA…


• If Fstat > Fcritical and alpha > p value…
• the means between group are different! (Yes or No)
• However, there is no answer about which group’s mean value is different
from what…

• Therefore, it is necessary to compare the means in pair-wise


Multiple comparison (Post Hoc comparison)
Tukey Test

• Tukey-Kramer method
• Tukey HSD (Honestly Significant Difference)
• Statistical tool used to determine if the relationship between two sets of
data is statistically significant
• Pairwise Comparison/ Multiple comparison
• Studentized range distribution
Multiple comparison (Post Hoc comparison)
Multiple comparison in Excel
Multiple comparison (Post Hoc comparison)
In-Class Assignment
In-Class Assignment
In-Class Assignment

• There are two major parameter to design experimental groups:


• (1) incorporated HA percentage (0, 10,and 20 %),
• (2) cell seeding density (High and Low)
• = total 6 experimental groups per time point (Day 1, 4, and 8)

• △: the calibrator group


• The dashed line: the fold change of the calibrator
In-Class Assignment

• + : statistical difference in HA amount within the


same cell seeding density group compared with
the 0% HA control group
• # : statistical difference in cell seeding density
groups within the same HA concentration group
• P < 0.05

• TWO determinations/conclusions should be made ( + and #)


• In terms of +, …. (the effect of incorporated HA percent)
• In terms of #, …. (the effect of cell seeding density)

You might also like