Probability and Statistics - Lecture 4

Introduction to Quantitative Biology
BIOL2001 / BIOL6200
Semester 1, 2024
Probability & Statistics

Lecture 4
Dr Stephen Zozaya
stephen.zozaya@anu.edu.au
Research School of Biology
Today’s plan
Continuing our adventures into hypothesis testing and categorical data:
• Lots of recap to drive home important concepts.
• Recap on Chi2 goodness-of-fit test.
• Recap on Null Hypothesis Significance Testing.
• Chi2 Homogeneity Test (testing independence with bivariate data)
- Be ready to actually try filling out a contingency table!
• Fisher’s Exact Test

Hypothesis testing – workflow
1. Define the Null Hypothesis and the Alternative Hypothesis.
2. Collect data.
3. Calculate a test statistic that compares the observed data to what we would expect if the null
hypothesis were true.
4. Define a threshold (alpha value; a) to determine whether a test statistic is statistically significant,
i.e., to determine whether to reject the null hypothesis based on a given p-value.
5. If this probability of obtaining the test statistic is below this threshold, we reject the null
hypothesis, otherwise we do not reject it.
P-values: what they are and what they are not
The probability of obtaining results at least as extreme as the observed result
assuming the null hypothesis is true.
The p-value...
• IS a statement about the probability of obtaining the respective data under the null hypothesis.
• is NOT the probability that a hypothesis is true.
• is NOT the probability that chance produced the results.
• is NOT a measure of how strong or important an effect is.
• IS strongly influenced by sample size.
The larger the sample size, the smaller the effect needed to produce a statistically significant p-value
Null Hypothesis Significance Testing (NHST)
We us statistical hypothesis tests to decide whether or not to reject the null hypothesis.
The null hypothesis (H0) is the hypotheses we are testing.
How do we determine whether to reject H0 given the p-value?
We set a threshold – the alpha value, a – below which a p-value is considered to reflect a statistically
significant result (i.e., one where we reject H0).
a is usually set at 0.05; sometimes 0.01 or 0.001, depending on the field (ultimately arbitrary)
a represents the conditional probability of rejecting the null hypothesis when the null hypothesis is
true; the Type I error rate.
Hypothesis testing – sometimes we will be wrong!
If we reject H0 when it is actually true, that is called a Type I error.
The Type I error rate, when H0 is true, is equivalent to our alpha value a.
If we fail to reject H0 (i.e., we retain it) when it is indeed false, that is called a Type II error.
It’s a snake! (Type I error) It’s not a snake! (Type II error)
Cape Range Delma (legless lizard) Robust Burrowing Snake

Chi-Square (X2) Statistic
A function that compares each expected frequency to its observed frequency.
2
𝑘
( 𝑜𝑖 − 𝑒 𝑖 )
𝜒 2𝑘 =∑
𝑖 =0 𝑒𝑖
Functions like these produce a test statistic, and this particular function is called the chi-square,
(χ2) statistic. The index k is the degrees of freedom (df).
k=n–1 (number of observations/categories –1)
is observed value
e is expected value
Chi-Square (X2) Statistic
Figure from the previous lecture: 2
Frequency of plots with given number of spiders
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 2𝑘 =∑ ; 𝑘 =𝑛 −1
𝑖 𝑒𝑖
k is the degrees of freedom.
The chi-square, χ2 statistic can be used to test whether

observed data fit the model — that is, whether the data
can be explained by the model.
Goodness of fit – the discrepancy between the observed

values and the expected values generated by a model.
0.5
Distributions of χ2 Statistic
k= 1
k= 2
k= 3
k= 4
k= 6
0.4
k= 9
• Area under the curve is probability.

0.3
• Degrees of freedom determines the shape.
• Whether an X2 value is “statistically significant”

0.2
depends on where it lands on the respective curve and

what the alpha value is.
0.1
0.0
0 2 4 6 8 10
χ2
Figuring out Degrees of Freedom
The number of degrees of freedom (df) is the total number of categories (or observations) minus the
number of categories (or observations) which we can calculate given the marginal (or total):
# spiders 0 1 2 ≥3
k = n – 1 for univariate data
Expected
5.4506 7.0858 4.6058 2.85780
Observed
3 9 5 3
Number of columns – 1 = 4 – 1 = 3 df
k = (r – 1) ∙ (c – 1) for bivariate data; r = number of rows, c = number of columns

Wash hands Don’t wash hands
Adult
84 22
Child
86 114
(2 rows – 1) ∙ (2 columns – 1) = 1 ∙ 1 = 1 df
Chi-Square Distribution α
Chi-Square Probabilities – Critical Values α – level of significance
shape df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 3.9e-5 1.5e-4 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
pchisq(q, df, lower.tail = F)

χ2 Test The critical value means that, for any random
experiment with given number of degrees of
freedom, 95% of the time the calculated χ2
value will be no greater than the critical value.
The critical value of 7.815 means that, for any

random experiment with 3 degrees of freedom,
95% of the time the calculated χ2 value will be
7.815 or less.
df = 3 critical value at df = 3
7.815
α – level of significance
0.05
retain hypothesis reject hypothesis

pchisq(q, df, lower.tail = F) ≈= 1 – pchisq(q, df, lower.tail = T)
The matter of independence and correlation
(or lack thereof)
χ2 Test & contingency tables
We have just seen the χ2 test used as a Goodness-of-Fit Test (the spider example in the previous lecture).
Now, we are going to see it applied as an "Independence Test", or "Test of Association", or "Homogeneity Test".
– A test to check whether the frequencies of one variable differ depending on the value of the other variable.
Are the two variables dependent?
Use of the χ2 test means we will continue to work with categorical data.
The "one variable vs. the other variable" means we will be working with bivariate data.
How do we analyse categorical bivariate data?

χ2 Test, contingency tables, & bivariate data
Example: A researcher is interested in the smoking habits of physics and medicine
students enrolled at the ANU.
The researcher wants to know whether there is an association between field of study and
smoking.
What is the null hypothesis?

H0: There is no association between field of study and smoking.
If smoking is not associated with field of study, it would mean that the probability of
being a smoker is independent from the probability of being in a specific field.
i.e., there is no relationship!
Is there is an association between field of study and smoking?
The researcher surveyed a random sample of 120 students from both fields and
obtained the following results:
Non-smokers Smokers Marginal freqs

Medicine 42 15
0 57
42
Physics 40 23 63
Marginal freqs 82 38
23 120
105
The researcher surveyed a random sample of 120 students from both fields and
obtained the following results:
• What is the probability of being Med student?
• What is the probability of being a smoker?

Medicine 42 15 57 P(Med)
57/120 = 0.475
Physics 40 23 63
Marginal freqs 82 38
23 120
105
P(Smoker)
23/120 = 0.3166
Rules of Probability
1. The Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive events, or those that cannot occur together,
then the third term is 0, and the rule reduces to P(A or B) = P(A) + P(B).
2. The Complement Rule: P(not A) = 1 - P(A)

3. The Multiplication Rule: P(A & B) = P(A) * P(B|A) or
P(A & B) = P(B) * P(A|B) and P(A & B) = P(B & A)
If A and B are independent, we can reduce the formula to:
P(A & B) = P(A) * P(B). The term independent refers to any event whose
outcome is not affected by the outcome of another event.
4. Bayes' Theorem
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120
If smoking and field of study are independent, we expect the joint frequencies to be:
Have a go at filling this out P(X & Y) = P(X)∙P(Y); E(X & Y) = P(X & Y)∙N
Medicine 0.475 x 0.3166 = 0.1504
0.1504 x 120 = 18.05 students
57 P(Med) = 0.475
Physics 63 P(Phy) = 1 - 0.475

P(Non-smoker) = 1 - 0.3166 P(Smoker) = 0.3166
Medicine 42 15 57
Physics 40 23 63
If smoking and field of study are independent, we expect the joint frequencies to be:
Have a go at filling this out P(X & Y) = P(X)∙P(Y); E(X & Y) = P(X & Y)∙N
Medicine 38.95 18.05 57
Physics 43.05 19.95 63
Medicine 42 15 57
Physics 40 23 63
2
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
𝑖 𝑒𝑖
Medicine E = 38.95 E = 18.05 57
Physics E = 43.05 E = 19.95 63
Medicine 42 15 57
Physics 40 23 63
Pause now and answer: How many degrees of freedom are there?
k = (r – 1) ∙ (c – 1); r = rows, c = columns
Table with χ2 values: Non-smokers Smokers

2
Medicine 0.238 0.515
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
Physics 0.216 0.466
𝑖 𝑒𝑖
Medicine 42 15 57
Physics 40 23 63
1 degree of freedom
k = (2 – 1) ∙ (2 – 1) = 1 ∙ 1 = 1
Table with χ2 values: Non-smokers Smokers

2
Medicine 0.238 0.515
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
Physics 0.216 0.466
𝑖 𝑒𝑖
χ distribution
2
α
2
Table: Chi-Square Probabilities – Critical Values 𝜒 =1.435
𝑘 α – level of significance
shape df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 3.9e-5 1.5e-4 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
pchisq(1.435, 1, lower.tail = F) = 0.231

• H0 was that there is no association between field of study and smoking.
• As biologists, we choose α = 0.05. (we are happy to be fools in 5% of cases).
• Calculated p-value = 0.231; this is larger than the α = 0.05
• We do not reject H0 – we cannot demonstrate an association between field of study
and smoking.

Medicine 42 15 57
Physics 40 23 63
How could we have done this using R?
R code to run the test:
# enter the data into matrix to represent
Use the chisq.test() function! the
# contingency table
smokers <- matrix(c(42, 15, 40, 23),
nrow = 2, ncol = 2, byrow = TRUE)
# calculate the chi-square test:

Non-smokers Smokers chisq.test(smokers, correct = FALSE)
Results:
Medicine 42 15
Pearson's Chi-squared test
Physics 40 23
data: smokers
X-squared = 1.4366, df = 1, p-value = 0.2307
Limitations of the χ2 Test
All observations must be independent; an individual cannot fit into more than one category.
In the previous example, the test does not allow for a student to be both a Med student and a Physics student.
Expected values for any one category should not be fewer than 5 and never fewer than 1, otherwise results unreliable.
It is recommended that χ2 not be used when total number of observations are fewer than 50 (more is better).
What to do when either of these are the case?

Fishers’ Exact Test!
Medicine 10 2 12
Physics 29 6 35
Fisher’s Exact Test – when sample sizes are small
Fisher’s Exact Test is useful when values
The test provides an exact p-value for a test of association. in any cell > 5.
Variable A1 Variable A2
Better than χ2 where the expected frequencies are too low to meet the Variable B1 8 9
rules demanded by the χ2 approximation. Variable B2 5 2
Fisher's exact test was developed for a 2 x 2 contingency table with fixed
row and column totals, but it can be expanded to larger tables.
The calculations in Fisher’s Exact Test are cumbersome, especially when

the counts are large.
Fisher's exact test is always more conservative than the χ2 test.

Doing a Fisher’s Exact Test in R
# enter the data into matrix to represent the
# contingency table
smokers <- matrix(c(42, 15, 40, 23),
Use the fisher.test() function! nrow = 2, ncol = 2, byrow = TRUE)
# calculate the Fisher's test test:

fisher.test(smokers)
Fisher's Exact Test for Count Data
Non-smokers Smokers data: smokers

p-value = 0.2457
alternative hypothesis: true odds ratio is
Medicine 42 15 not equal to 1
95 percent confidence interval:
Physics 40 23 0.6890041 3.8158559
sample estimates:
odds ratio
1.603587
Doing a Fisher’s Exact Test in R
Compare results from the two tests:
Results of χ2 Test Results of Fisher’s Exact Test

Pearson's Chi-squared test Fisher's Exact Test for Count Data
data: smokers data: smokers

X-squared = 1.4366, df = 1, p-value = 0.2307 p-value = 0.2457
alternative hypothesis: true odds ratio is
not equal to 1
95 percent confidence interval:
Similar results in terms of p–value, with Fisher’s Test
0.6890041 3.8158559
being slightly more conservative (as expected) sample estimates:
odds ratio
1.603587
- Pause now and formulate this problem in the context of the hypothesis testing workflow.
2. Collect data.
H0 – smoking and field of study are independent (not related). Probability of being a smoker is
the same regardless of the field of study.
2. Collect data.
Collect observations (run questionnaire). Build the contingency table, calculate expected values
under the null model (probs are independent).
Calculate the χ2 statistic and the number of degrees of freedom:
χ2 = 1.435
k=1
Let's choose the commonly used α = 0.05 threshold.
Calculated probability is 0.231. It is larger than the α threshold of 0.05.
We therefore retain (i.e., accept, can not reject) our null hypothesis, H0.
H0 – smoking and field of study are independent (not related). The probability of being a
smoker is the same regardless of the field of study.
Hypothesis Testing: field of study & smoking
The epicrisis (a critical or analytical summary):

We could not reject the H0 and therefore cannot conclude that there is a statistically
significant relationship (i.e., dependence) between field of study (physics & medicine)
and the probability of smoking.
If we were to reject H0 anyway (thus accepting the H1 alternative – that smoking and
field of study are indeed associated, i.e., not independent), the probability of us
making a Type I error would be 0.231.
That is, there is a 23% chance that we are committing a Type I error, where the
typically accepted Type I error rate in Biology is 5%.
Summary
• A Type I error is when we reject H0 when it is actually true.
• A Type II error is when we H0 when it is actually false.
• The number of degrees of freedom (df) is the total number of categories (or observations) minus the number
of categories (or observations) which we can calculate given the marginal (or total):
• k = n – 1 for univariate data (e.g., the spider example last lecture had 4 columns; 4 -1 = 3 df)
• k = (r – 1) ∙ (c – 1) for bivariate data; r = number of rows, c = number of columns
• The χ2 goodness-of-fit and homogeneity tests are similar in terms of calculations.
• They differ in that a goodness-of-fit test compares against a known distribution, whereas homogeneity test
checks if one categorical variable correlates with the other variable (whether they are dependent).
• Both tests get “sketchy" if values in any cell are fewer than 5, and really, really sketchy if fewer than 1 (zero).
• When this is the case, or number of total observations < 50, use Fisher’s Exact Test!

Probability and Statistics - Lecture 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability and Statistics - Lecture 4

Uploaded by

Copyright:

Available Formats

Introduction to Quantitative Biology

Probability & Statistics

• Fisher’s Exact Test

1. Define the Null Hypothesis and the Alternative Hypothesis.

• is NOT the probability that a hypothesis is true.

• is NOT the probability that chance produced the results.

• is NOT a measure of how strong or important an effect is.

• IS strongly influenced by sample size.

The null hypothesis (H0) is the hypotheses we are testing.

How do we determine whether to reject H0 given the p-value?

It’s a snake! (Type I error) It’s not a snake! (Type II error)

Cape Range Delma (legless lizard) Robust Burrowing Snake

k=n–1 (number of observations/categories –1)

k is the degrees of freedom.

The chi-square, χ2 statistic can be used to test whether

Goodness of fit – the discrepancy between the observed

• Area under the curve is probability.

• Degrees of freedom determines the shape.

• Whether an X2 value is “statistically significant”

depends on where it lands on the respective curve and

k = (r – 1) ∙ (c – 1) for bivariate data; r = number of rows, c = number of columns

Chi-Square Probabilities – Critical Values α – level of significance

pchisq(q, df, lower.tail = F)

The critical value of 7.815 means that, for any

retain hypothesis reject hypothesis

Are the two variables dependent?

How do we analyse categorical bivariate data?

What is the null hypothesis?

Non-smokers Smokers Marginal freqs

Non-smokers Smokers Marginal freqs

2. The Complement Rule: P(not A) = 1 - P(A)

Physics 63 P(Phy) = 1 - 0.475

Marginal freqs 82 38 120

Table with χ2 values: Non-smokers Smokers

Table with χ2 values: Non-smokers Smokers

pchisq(1.435, 1, lower.tail = F) = 0.231

Non-smokers Smokers Marginal freqs

# calculate the chi-square test:

What to do when either of these are the case?

The calculations in Fisher’s Exact Test are cumbersome, especially when

Fisher's exact test is always more conservative than the χ2 test.

# calculate the Fisher's test test:

Fisher's Exact Test for Count Data

Non-smokers Smokers data: smokers

Compare results from the two tests:

Results of χ2 Test Results of Fisher’s Exact Test

data: smokers data: smokers

1. Define the Null Hypothesis and the Alternative Hypothesis.

The epicrisis (a critical or analytical summary):

You might also like