Guide For Statistical Analysis For IA - Simple Ver

1
IB Biology
Guide for Statistical Analysis for IA - Technical Side
Before you start reading

This guide serves the purpose of:
1. Using Excel to carry out statistical analysis

2. A brief introduction to statistics
What are our objectives?

We either use statistics to test:
● Relationships: i.e. does the time invested on study affects grades
● Differences: i.e. does studying for 30 mins and 60m ins have a significant effect on
grades
Key terms you should know

Data can be separated into the following categories:
Mean, standard deviation, and normal distribution

2
In statistics we expect occurrence of events to follow a normal distribution, which means it is the
most probable to find average values, and the least probable to find extreme values:
U = mean; 𝜎 = standard deviation

● We expect 68.27% of occurrence of events to be found within u +/- 1𝜎
● We expect 99.73% of occurrence of events to be found within u +/- 3𝜎
Standard deviation can be interpreted as the spread (distribution) of data away from the mean:
*Important: In our statistical analysis, we are all assuming data to be normally distributed
and different data have equal variance. That said, this assumption can be used for your
evaluation part
Hypothesis testing and p-value

In inferential statistics, it is common to test if two values are different from one another
(...statistically, not numerically)
I.e. grades before revision has no difference with the grades after revision
3
(that the grade difference is not due to revision, but due to chance and randomness)
Usually we set up two hypothesis for testing:
● Null hypothesis (H0): two values have no difference from each other
● Alternative hypothesis: two values are different from each other
P-value is a significance level, and also a probability that the null hypothesis is true
Normally, we use a p-value of 0.05 (a probability of 5%) as a significance level, that if p < 0.05,
then there is less than 5% chance that the null hypothesis is true…only! And so we can gladly
reject the null hypothesis and accept the alternative hypothesis.
Statistical models
There are different statistical models that help us to find relationships or differences, including:
Finding relationships Comparing means
Correlation: finding if two things are related t-test

(one sample t-test, two sample unpaired
t-test, two sample paired t-test)
Regression: finding cause-effect relationships ANOVA → Tukey test
So which models should I use?

Here are a list of questions that you should ask yourself first, regarding your data:
1. Are you finding a relationship or finding a difference?

a. relationships
b. Difference
4
Testing for relationships: Causation or correlation?

It is important to know the difference between causation and correlation.
Prices increase from 2006 to 2018 due to inflation, while the no. of mcdonald shop also
increases…so mcdonalds is the inflation causer???
Causation (cause-effect): no. of mcdonalds cause inflation??? (Of course NOT! Not a
causation effect)
Correlation (association): the price and mcdonalds stores may have a relationship
5
Causations
For causations, we are already certain about a cause will lead to an effect.
I.e. temperature will lead to changes in enzyme activity

I.e. osmotic pressure will lead to changes in cell size
Plotting Regression
For us to find causation, regression analysis is required:

https://www.youtube.com/watch?v=Cltt47Ah3Q4&ab_channel=JalayerAcademy
This video will teach you how to plot regression and do data analysis on Excel
R^2 Value
R^2 (coefficient of determination) value: The coefficient of determination ranges from 0 to 1, with
1 showing as the strongest magnitude of causation
Finding significance of R^2 value by ANOVA box

After testing for correlations and obtaining the R^2 value, we also need to see if the correlation
is statistically significant by an ANOVA box.
Here we consider the following hypothesis:
H0: the two variables have no relationship

H1: the two variables have relationship
6
Correlation
For correlations, we are not sure about whether two factors are associated.
I.e. Does number of attendance affect the no. of goals scored in a football match?
I.e. Is there a relationship between money owned by an individual to their happiness index?
Finding the Correlation index, r-value

This video will show you how to do plot correlation graphs and find the r-value
https://www.youtube.com/watch?v=1_jeoqjHtjA&ab_channel=DavidLanger
Here’s how we interpret the r-value:

7
P-value analysis for Correlation

After testing for correlations, we also need to see if the correlation is statistically significant.
(using the p-value) This is a video that shows you how:
https://www.youtube.com/watch?v=vFcxExzLfZI&ab_channel=QuantitativeSpecialists
Here we consider the following hypothesis:
H0: the two variables have no correlation

H1: the two variables have correlations
8
Testing for differences

Here’s an overview of the framework for selecting models for testing for differences, and the
statistical tests that one has to do:
1. Are you comparing two means only?

a. Yes
b. No
9
10
t-test
Since there are multiple t-tests, you may want to ask yourself this question
1. Am I…
a. Comparing 1 sample mean to the population mean → one sample t-test
I.e. comparing the average IB results of delia GP to IB worldwide results

I.e. comparing the weight of class 6A to the weight of average Hong Kong S6 students
b. Comparing 2 sample means that are not related → student’s t-test / independent
t-test
I.e. comparing the IB results of class 5B and class 5C

I.e. comparing the weight of patients who take drug A and those who take drug B
c. Comparing 2 sample means that are related (i.e. at different periods of time) →
two sample paired t-test
I.e. comparing the results of students before and after attending Mr. Samson’s class
I.e. comparing the weight of patients before and after taking drug A
11
One sample t-test: Comparing sample mean to a known population

mean
Null hypothesis (H0): the sample mean = known population mean
alternative hypothesis: the sample mean is different than the known population mean
● If p<0.05, reject the null hypothesis

● If p>0.05, accept the null hypothesis
Example:
It is known that the labels on the protein bars claim that each bar contains 20 grams of protein.
Random samples of 31 energy bars from a number of different stores, and their protein contents
were measured.
Null hypothesis:
Alternative hypothesis:
Results of t-test show a p-value of 0.0046 < 0.05, therefore the null hypothesis is rejected.
Conclusion: The labels claiming 20 grams of protein would be incorrect.
Performing on Excels
https://www.youtube.com/watch?v=OCSmMABkVqQ&list=PLEDQSOItvrBat1jK4QtWaWpt
HWu-wVzW_&index=1&t=4s&ab_channel=TopTipBio
Effect Size
12
To investigate the magnitude of the difference, please conduct Effect Size Analysis.
13
Student’s t-test / independent t-test / Two sample unpaired t-test: 2

sample means are not related
Null hypothesis (H0): the 2 sets of data have the same means
alternative hypothesis: the 2 sets of data have different means

Example:
Weight of 100 individuals is measured: 50 women (group A) and 50 men (group B). We want to
know if the mean weight of women (mA) is significantly different from that of men (mB).
H0: mean weight of men = mean weight of women

Ha: mean weight of men =/= mean weight of men
Results:
Since p-value = 0.01327, H0 is rejected and there is significant difference between the weight of
men and women.
Performing on Excels:
https://www.youtube.com/watch?v=kmww0EewIp0&list=PLEDQSOItvrBat1jK4QtWaWptH
Wu-wVzW_&index=2&t=8s&ab_channel=DavidDunaetz
Effect Size
14
Two sample paired t-test: 2 sample means are related

Null hypothesis (H0): the 2 sets of data have the same means
alternative hypothesis: the 2 sets of data have different means

Example 1: the max vertical jump of college basketball players is measured before and after
participating in a training program.
● If p-value <0.05, then we can conclude that there are significant differences in max
vertical jump before and after the training program.
● If p-value >0.05, then we can conclude that the training program makes no significant
difference to the max vertical jump, that any improvement or difference is a result of
randomness/chance.
Example 2: the response time of a patient is measured on two different drugs.
https://www.youtube.com/watch?v=N2Rusw-xBIw&ab_channel=SocratGhadban
Effect Size
15
16
More than 2 Means: One-way ANOVA

One-way ANOVA can be adopted.
Null hypothesis (H0): all groups have the same mean

alternative hypothesis: at least one group have different means
● If p<0.05, reject the null hypothesis → find out which groups have difference
● If p>0.05, accept the null hypothesis → END
Example 1:
Example 2:
Test if there’s a significant difference in SO2 concentrations at different times in a construction
site.
ANOVA results:
17
● Site A: p>0.05, the null hypothesis is accepted and there is no significant difference
● Site B: p<0.05, the null hypothesis is rejected and there is a significant difference
Example 3:
Test if there’s a significant difference in tree heights in 3 species
As p value is 0.000 < 0.05, there is significant difference in tree height between at least two
groups. Post-hoc tests need to be conducted to find out which pair of tree species have
significant differences.
Conducting on Excel:
https://www.youtube.com/watch?v=ZvfO7-J5u34&list=PLEDQSOItvrBat1jK4QtWaWptHWu-wVz
W_&index=5&t=6s&ab_channel=TopTipBio
Post hoc (meaning: after) test: Tukey HSD

Interpretation of results
In this example, the weight loss resulted from different exercising times is compared:
18
From the above result table:

1. Weight loss for 30 minutes per day of exercise vs no exercise (p: 0.852) is insignificant
2. Weight loss for 60 minutes per day of exercise vs no exercise (p: 0.000) is significant
3. Weight loss for 60 minutes per day of exercise vs 30 minutes per day of exercise (p:
0.000) is significant
https://www.youtube.com/watch?v=YbX-JUqD1so&list=PLEDQSOItvrBat1jK4QtWaWptH
Wu-wVzW_&index=4&t=6s&ab_channel=VincentStevenson
Effect size
19
Effect Size
Further reading (why is p-value not enough): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/
Upon knowing the significant difference between sample means, effect size can be calculated to
tell the magnitude of experimental impacts.
There are several scenarios that effect size can be used:

● Calculating before and after effects
● Calculating experiment vs control group effects
Effect sizes can be calculated by:
An effect size of 1 indicates the two groups differ by 1 standard deviation, a d of 2 indicates they
differ by 2 standard deviations, and so on.
Example 1 : paired t-test

paired t-test results have shown that there are significant differences between grades before
and after revision. Effect size can tell us what is the impact of revision on student grades.
Let’s assume that the effect size of the above example is 0.64, then it means the grades after
revision is 0.64 standard deviations higher than that of the grades before revision
Example 2: ANOVA
The mean scores of students who study at different times are collected. Upon ANOVA, results
show that there are significant differences in the scores for students who study at different
times.
Effect size can be calculated by :

20
[Mean score studying at time A] - [Mean score of control group] / (standard deviation of whole
population)
The following results table is obtained:

Variable pairs Effect size
Time A - control 0.8
Time B - control 0.6
Time C - control -0.2
Based on the effect size, we can conclude that effects of grades are:
Time C < Time B < Time A,
That time A has made the most positive impacts on mean scores while time C has made the
most negative impacts.
Conducting on Excel
https://www.youtube.com/watch?v=zUmQ2PZZRJ4&ab_channel=TopTipBio

Guide For Statistical Analysis For IA - Simple Ver

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Guide For Statistical Analysis For IA - Simple Ver

Uploaded by

Copyright:

Available Formats

1

Before you start reading

1. Using Excel to carry out statistical analysis

What are our objectives?

Key terms you should know

Mean, standard deviation, and normal distribution

U = mean; 𝜎 = standard deviation

Hypothesis testing and p-value

Usually we set up two hypothesis for testing:

Finding relationships Comparing means

Correlation: finding if two things are related t-test

Regression: finding cause-effect relationships ANOVA → Tukey test

So which models should I use?

1. Are you finding a relationship or finding a difference?

Testing for relationships: Causation or correlation?

I.e. temperature will lead to changes in enzyme activity

For us to find causation, regression analysis is required:

Finding significance of R^2 value by ANOVA box

Here we consider the following hypothesis:

H0: the two variables have no relationship

Finding the Correlation index, r-value

Here’s how we interpret the r-value:

P-value analysis for Correlation

Here we consider the following hypothesis:

H0: the two variables have no correlation

Testing for differences

1. Are you comparing two means only?

I.e. comparing the average IB results of delia GP to IB worldwide results

I.e. comparing the IB results of class 5B and class 5C

One sample t-test: Comparing sample mean to a known population

● If p<0.05, reject the null hypothesis

Student’s t-test / independent t-test / Two sample unpaired t-test: 2

● If p<0.05, reject the null hypothesis

H0: mean weight of men = mean weight of women

Two sample paired t-test: 2 sample means are related

● If p<0.05, reject the null hypothesis

Example 2: the response time of a patient is measured on two different drugs.

More than 2 Means: One-way ANOVA

Null hypothesis (H0): all groups have the same mean

Post hoc (meaning: after) test: Tukey HSD

From the above result table:

There are several scenarios that effect size can be used:

Effect sizes can be calculated by:

Example 1 : paired t-test

Effect size can be calculated by :

The following results table is obtained:

Time A - control 0.8

Time B - control 0.6

Time C - control -0.2

Time C < Time B < Time A,

You might also like