Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Introduction to Quantitative Biology

BIOL2001 / BIOL6200
Semester 1, 2024

Probability & Statistics


Lecture 4
Dr Stephen Zozaya
stephen.zozaya@anu.edu.au
Research School of Biology
Today’s plan
Continuing our adventures into hypothesis testing and categorical data:
• Lots of recap to drive home important concepts.
• Recap on Chi2 goodness-of-fit test.
• Recap on Null Hypothesis Significance Testing.
• Chi2 Homogeneity Test (testing independence with bivariate data)
- Be ready to actually try filling out a contingency table!

• Fisher’s Exact Test


Hypothesis testing – workflow

1. Define the Null Hypothesis and the Alternative Hypothesis.

2. Collect data.

3. Calculate a test statistic that compares the observed data to what we would expect if the null
hypothesis were true.

4. Define a threshold (alpha value; a) to determine whether a test statistic is statistically significant,
i.e., to determine whether to reject the null hypothesis based on a given p-value.

5. If this probability of obtaining the test statistic is below this threshold, we reject the null
hypothesis, otherwise we do not reject it.
P-values: what they are and what they are not
The probability of obtaining results at least as extreme as the observed result
assuming the null hypothesis is true.
The p-value...

• IS a statement about the probability of obtaining the respective data under the null hypothesis.

• is NOT the probability that a hypothesis is true.

• is NOT the probability that chance produced the results.

• is NOT a measure of how strong or important an effect is.

• IS strongly influenced by sample size.

The larger the sample size, the smaller the effect needed to produce a statistically significant p-value
Null Hypothesis Significance Testing (NHST)

We us statistical hypothesis tests to decide whether or not to reject the null hypothesis.

The null hypothesis (H0) is the hypotheses we are testing.

How do we determine whether to reject H0 given the p-value?

We set a threshold – the alpha value, a – below which a p-value is considered to reflect a statistically
significant result (i.e., one where we reject H0).

a is usually set at 0.05; sometimes 0.01 or 0.001, depending on the field (ultimately arbitrary)

a represents the conditional probability of rejecting the null hypothesis when the null hypothesis is
true; the Type I error rate.
Hypothesis testing – sometimes we will be wrong!
If we reject H0 when it is actually true, that is called a Type I error.

The Type I error rate, when H0 is true, is equivalent to our alpha value a.
If we fail to reject H0 (i.e., we retain it) when it is indeed false, that is called a Type II error.

It’s a snake! (Type I error) It’s not a snake! (Type II error)

Cape Range Delma (legless lizard) Robust Burrowing Snake


Chi-Square (X2) Statistic
A function that compares each expected frequency to its observed frequency.
2
𝑘
( 𝑜𝑖 − 𝑒 𝑖 )
𝜒 2𝑘 =∑
𝑖 =0 𝑒𝑖

Functions like these produce a test statistic, and this particular function is called the chi-square,
(χ2) statistic. The index k is the degrees of freedom (df).

k=n–1 (number of observations/categories –1)

is observed value

e is expected value
Chi-Square (X2) Statistic
Figure from the previous lecture: 2
Frequency of plots with given number of spiders
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 2𝑘 =∑ ; 𝑘 =𝑛 −1
𝑖 𝑒𝑖

k is the degrees of freedom.

The chi-square, χ2 statistic can be used to test whether


observed data fit the model — that is, whether the data
can be explained by the model.

Goodness of fit – the discrepancy between the observed


values and the expected values generated by a model.
0.5
Distributions of χ2 Statistic
k= 1
k= 2
k= 3
k= 4
k= 6
0.4

k= 9

• Area under the curve is probability.


0.3

• Degrees of freedom determines the shape.

• Whether an X2 value is “statistically significant”


0.2

depends on where it lands on the respective curve and


what the alpha value is.
0.1
0.0

0 2 4 6 8 10

χ2
Figuring out Degrees of Freedom
The number of degrees of freedom (df) is the total number of categories (or observations) minus the
number of categories (or observations) which we can calculate given the marginal (or total):
# spiders 0 1 2 ≥3
k = n – 1 for univariate data
Expected
5.4506 7.0858 4.6058 2.85780
Observed
3 9 5 3
Number of columns – 1 = 4 – 1 = 3 df

k = (r – 1) ∙ (c – 1) for bivariate data; r = number of rows, c = number of columns


Wash hands Don’t wash hands
Adult
84 22
Child
86 114
(2 rows – 1) ∙ (2 columns – 1) = 1 ∙ 1 = 1 df
Chi-Square Distribution α

Chi-Square Probabilities – Critical Values α – level of significance

shape df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 3.9e-5 1.5e-4 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589

pchisq(q, df, lower.tail = F)


χ2 Test The critical value means that, for any random
experiment with given number of degrees of
freedom, 95% of the time the calculated χ2
value will be no greater than the critical value.

The critical value of 7.815 means that, for any


random experiment with 3 degrees of freedom,
95% of the time the calculated χ2 value will be
7.815 or less.

df = 3 critical value at df = 3
7.815
α – level of significance
0.05

retain hypothesis reject hypothesis


pchisq(q, df, lower.tail = F) ≈= 1 – pchisq(q, df, lower.tail = T)
The matter of independence and correlation
(or lack thereof)
χ2 Test & contingency tables
We have just seen the χ2 test used as a Goodness-of-Fit Test (the spider example in the previous lecture).

Now, we are going to see it applied as an "Independence Test", or "Test of Association", or "Homogeneity Test".

– A test to check whether the frequencies of one variable differ depending on the value of the other variable.

Are the two variables dependent?

Use of the χ2 test means we will continue to work with categorical data.

The "one variable vs. the other variable" means we will be working with bivariate data.

How do we analyse categorical bivariate data?


χ2 Test, contingency tables, & bivariate data
Example: A researcher is interested in the smoking habits of physics and medicine
students enrolled at the ANU.

The researcher wants to know whether there is an association between field of study and
smoking.

What is the null hypothesis?


H0: There is no association between field of study and smoking.

If smoking is not associated with field of study, it would mean that the probability of
being a smoker is independent from the probability of being in a specific field.
i.e., there is no relationship!
χ2 Test, contingency tables, & bivariate data
Is there is an association between field of study and smoking?

The researcher surveyed a random sample of 120 students from both fields and
obtained the following results:

Non-smokers Smokers Marginal freqs


Medicine 42 15
0 57
42
Physics 40 23 63
Marginal freqs 82 38
23 120
105
χ2 Test, contingency tables, & bivariate data
Is there is an association between field of study and smoking?

The researcher surveyed a random sample of 120 students from both fields and
obtained the following results:
• What is the probability of being Med student?
• What is the probability of being a smoker?

Non-smokers Smokers Marginal freqs


Medicine 42 15 57 P(Med)
57/120 = 0.475

Physics 40 23 63
Marginal freqs 82 38
23 120
105
P(Smoker)
23/120 = 0.3166
Rules of Probability
1. The Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
If A and B are mutually exclusive events, or those that cannot occur together,
then the third term is 0, and the rule reduces to P(A or B) = P(A) + P(B).

2. The Complement Rule: P(not A) = 1 - P(A)


3. The Multiplication Rule: P(A & B) = P(A) * P(B|A) or
P(A & B) = P(B) * P(A|B) and P(A & B) = P(B & A)
If A and B are independent, we can reduce the formula to:
P(A & B) = P(A) * P(B). The term independent refers to any event whose
outcome is not affected by the outcome of another event.
4. Bayes' Theorem
χ2 Test, contingency tables, & bivariate data
Non-smokers Smokers Marginal freqs
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120

If smoking and field of study are independent, we expect the joint frequencies to be:
Have a go at filling this out P(X & Y) = P(X)∙P(Y); E(X & Y) = P(X & Y)∙N
Non-smokers Smokers Marginal freqs
Medicine 0.475 x 0.3166 = 0.1504
0.1504 x 120 = 18.05 students
57 P(Med) = 0.475

Physics 63 P(Phy) = 1 - 0.475

Marginal freqs 82 38 120


P(Non-smoker) = 1 - 0.3166 P(Smoker) = 0.3166
χ2 Test, contingency tables, & bivariate data
Non-smokers Smokers Marginal freqs
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120

If smoking and field of study are independent, we expect the joint frequencies to be:
Have a go at filling this out P(X & Y) = P(X)∙P(Y); E(X & Y) = P(X & Y)∙N
Non-smokers Smokers Marginal freqs
Medicine 38.95 18.05 57
Physics 43.05 19.95 63
Marginal freqs 82 38 120
χ2 Test, contingency tables, & bivariate data
Non-smokers Smokers Marginal freqs
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120
2
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
𝑖 𝑒𝑖
Non-smokers Smokers Marginal freqs
Medicine E = 38.95 E = 18.05 57
Physics E = 43.05 E = 19.95 63
Marginal freqs 82 38 120
χ2 Test, contingency tables, & bivariate data
Non-smokers Smokers Marginal freqs
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120

Pause now and answer: How many degrees of freedom are there?
k = (r – 1) ∙ (c – 1); r = rows, c = columns

Table with χ2 values: Non-smokers Smokers


2
Medicine 0.238 0.515
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
Physics 0.216 0.466
𝑖 𝑒𝑖
χ2 Test, contingency tables, & bivariate data
Non-smokers Smokers Marginal freqs
Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120

1 degree of freedom
k = (2 – 1) ∙ (2 – 1) = 1 ∙ 1 = 1

Table with χ2 values: Non-smokers Smokers


2
Medicine 0.238 0.515
𝑛
( 𝑜𝑖 −𝑒 𝑖 )
𝜒 =∑
2
𝑘
Physics 0.216 0.466
𝑖 𝑒𝑖
χ distribution
2
α
2
Table: Chi-Square Probabilities – Critical Values 𝜒 =1.435
𝑘 α – level of significance

shape df 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
1 3.9e-5 1.5e-4 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589

pchisq(1.435, 1, lower.tail = F) = 0.231


χ2 Test, contingency tables, & bivariate data
Is there is an association between field of study and smoking?
• H0 was that there is no association between field of study and smoking.
• As biologists, we choose α = 0.05. (we are happy to be fools in 5% of cases).
• Calculated p-value = 0.231; this is larger than the α = 0.05
• We do not reject H0 – we cannot demonstrate an association between field of study
and smoking.

Non-smokers Smokers Marginal freqs


Medicine 42 15 57
Physics 40 23 63
Marginal freqs 82 38 120
How could we have done this using R?
Is there is an association between field of study and smoking?
R code to run the test:
# enter the data into matrix to represent
Use the chisq.test() function! the
# contingency table
smokers <- matrix(c(42, 15, 40, 23),
nrow = 2, ncol = 2, byrow = TRUE)

# calculate the chi-square test:


Non-smokers Smokers chisq.test(smokers, correct = FALSE)
Results:
Medicine 42 15
Pearson's Chi-squared test
Physics 40 23
data: smokers
X-squared = 1.4366, df = 1, p-value = 0.2307
Limitations of the χ2 Test
All observations must be independent; an individual cannot fit into more than one category.

In the previous example, the test does not allow for a student to be both a Med student and a Physics student.

Expected values for any one category should not be fewer than 5 and never fewer than 1, otherwise results unreliable.

It is recommended that χ2 not be used when total number of observations are fewer than 50 (more is better).

What to do when either of these are the case?


Non-smokers Smokers Marginal freqs
Fishers’ Exact Test!
Medicine 10 2 12

Physics 29 6 35

Marginal freqs 39 8 47
Fisher’s Exact Test – when sample sizes are small
Fisher’s Exact Test is useful when values
The test provides an exact p-value for a test of association. in any cell > 5.
Variable A1 Variable A2
Better than χ2 where the expected frequencies are too low to meet the Variable B1 8 9
rules demanded by the χ2 approximation. Variable B2 5 2

Fisher's exact test was developed for a 2 x 2 contingency table with fixed
row and column totals, but it can be expanded to larger tables.

The calculations in Fisher’s Exact Test are cumbersome, especially when


the counts are large.

Fisher's exact test is always more conservative than the χ2 test.


Doing a Fisher’s Exact Test in R
Is there is an association between field of study and smoking?
# enter the data into matrix to represent the
# contingency table
smokers <- matrix(c(42, 15, 40, 23),
Use the fisher.test() function! nrow = 2, ncol = 2, byrow = TRUE)

# calculate the Fisher's test test:


fisher.test(smokers)

Fisher's Exact Test for Count Data

Non-smokers Smokers data: smokers


p-value = 0.2457
alternative hypothesis: true odds ratio is
Medicine 42 15 not equal to 1
95 percent confidence interval:
Physics 40 23 0.6890041 3.8158559
sample estimates:
odds ratio
1.603587
Doing a Fisher’s Exact Test in R
Is there is an association between field of study and smoking?

Compare results from the two tests:

Results of χ2 Test Results of Fisher’s Exact Test


Pearson's Chi-squared test Fisher's Exact Test for Count Data

data: smokers data: smokers


X-squared = 1.4366, df = 1, p-value = 0.2307 p-value = 0.2457
alternative hypothesis: true odds ratio is
not equal to 1
95 percent confidence interval:
Similar results in terms of p–value, with Fisher’s Test
0.6890041 3.8158559
being slightly more conservative (as expected) sample estimates:
odds ratio
1.603587
χ2 Test, contingency tables, & bivariate data
Is there is an association between field of study and smoking?

- Pause now and formulate this problem in the context of the hypothesis testing workflow.

1. Define the Null Hypothesis and the Alternative Hypothesis.

2. Collect data.

3. Calculate a test statistic that compares the observed data to what we would expect if the null
hypothesis were true.

4. Define a threshold (alpha value; a) to determine whether a test statistic is statistically significant,
i.e., to determine whether to reject the null hypothesis based on a given p-value.

5. If this probability of obtaining the test statistic is below this threshold, we reject the null
hypothesis, otherwise we do not reject it.
Hypothesis testing – workflow
1. Define the Null Hypothesis and the Alternative Hypothesis.
H0 – smoking and field of study are independent (not related). Probability of being a smoker is
the same regardless of the field of study.

2. Collect data.
Collect observations (run questionnaire). Build the contingency table, calculate expected values
under the null model (probs are independent).

3. Calculate a test statistic that compares the observed data to what we would expect if the null
hypothesis were true.
Calculate the χ2 statistic and the number of degrees of freedom:
χ2 = 1.435
k=1
Hypothesis testing – workflow

4. Define a threshold (alpha value; a) to determine whether a test statistic is statistically significant,
i.e., to determine whether to reject the null hypothesis based on a given p-value.
Let's choose the commonly used α = 0.05 threshold.

5. If this probability of obtaining the test statistic is below this threshold, we reject the null
hypothesis, otherwise we do not reject it.
Calculated probability is 0.231. It is larger than the α threshold of 0.05.
We therefore retain (i.e., accept, can not reject) our null hypothesis, H0.
H0 – smoking and field of study are independent (not related). The probability of being a
smoker is the same regardless of the field of study.
Hypothesis Testing: field of study & smoking

The epicrisis (a critical or analytical summary):


We could not reject the H0 and therefore cannot conclude that there is a statistically
significant relationship (i.e., dependence) between field of study (physics & medicine)
and the probability of smoking.

If we were to reject H0 anyway (thus accepting the H1 alternative – that smoking and
field of study are indeed associated, i.e., not independent), the probability of us
making a Type I error would be 0.231.

That is, there is a 23% chance that we are committing a Type I error, where the
typically accepted Type I error rate in Biology is 5%.
Summary
• A Type I error is when we reject H0 when it is actually true.
• A Type II error is when we H0 when it is actually false.
• The number of degrees of freedom (df) is the total number of categories (or observations) minus the number
of categories (or observations) which we can calculate given the marginal (or total):
• k = n – 1 for univariate data (e.g., the spider example last lecture had 4 columns; 4 -1 = 3 df)
• k = (r – 1) ∙ (c – 1) for bivariate data; r = number of rows, c = number of columns
• The χ2 goodness-of-fit and homogeneity tests are similar in terms of calculations.
• They differ in that a goodness-of-fit test compares against a known distribution, whereas homogeneity test
checks if one categorical variable correlates with the other variable (whether they are dependent).
• Both tests get “sketchy" if values in any cell are fewer than 5, and really, really sketchy if fewer than 1 (zero).
• When this is the case, or number of total observations < 50, use Fisher’s Exact Test!

You might also like