Professional Documents
Culture Documents
Guide For Statistical Analysis For IA - Simple Ver
Guide For Statistical Analysis For IA - Simple Ver
IB Biology
Guide for Statistical Analysis for IA - Technical Side
In statistics we expect occurrence of events to follow a normal distribution, which means it is the
most probable to find average values, and the least probable to find extreme values:
Standard deviation can be interpreted as the spread (distribution) of data away from the mean:
*Important: In our statistical analysis, we are all assuming data to be normally distributed
and different data have equal variance. That said, this assumption can be used for your
evaluation part
I.e. grades before revision has no difference with the grades after revision
3
(that the grade difference is not due to revision, but due to chance and randomness)
● Null hypothesis (H0): two values have no difference from each other
● Alternative hypothesis: two values are different from each other
P-value is a significance level, and also a probability that the null hypothesis is true
Normally, we use a p-value of 0.05 (a probability of 5%) as a significance level, that if p < 0.05,
then there is less than 5% chance that the null hypothesis is true…only! And so we can gladly
reject the null hypothesis and accept the alternative hypothesis.
Statistical models
There are different statistical models that help us to find relationships or differences, including:
Prices increase from 2006 to 2018 due to inflation, while the no. of mcdonald shop also
increases…so mcdonalds is the inflation causer???
Causation (cause-effect): no. of mcdonalds cause inflation??? (Of course NOT! Not a
causation effect)
Correlation (association): the price and mcdonalds stores may have a relationship
5
Causations
For causations, we are already certain about a cause will lead to an effect.
Plotting Regression
R^2 Value
R^2 (coefficient of determination) value: The coefficient of determination ranges from 0 to 1, with
1 showing as the strongest magnitude of causation
Correlation
For correlations, we are not sure about whether two factors are associated.
I.e. Does number of attendance affect the no. of goals scored in a football match?
I.e. Is there a relationship between money owned by an individual to their happiness index?
t-test
Since there are multiple t-tests, you may want to ask yourself this question
1. Am I…
a. Comparing 1 sample mean to the population mean → one sample t-test
b. Comparing 2 sample means that are not related → student’s t-test / independent
t-test
c. Comparing 2 sample means that are related (i.e. at different periods of time) →
two sample paired t-test
I.e. comparing the results of students before and after attending Mr. Samson’s class
I.e. comparing the weight of patients before and after taking drug A
11
Example:
It is known that the labels on the protein bars claim that each bar contains 20 grams of protein.
Random samples of 31 energy bars from a number of different stores, and their protein contents
were measured.
Null hypothesis:
Alternative hypothesis:
Results of t-test show a p-value of 0.0046 < 0.05, therefore the null hypothesis is rejected.
Conclusion: The labels claiming 20 grams of protein would be incorrect.
Performing on Excels
https://www.youtube.com/watch?v=OCSmMABkVqQ&list=PLEDQSOItvrBat1jK4QtWaWpt
HWu-wVzW_&index=1&t=4s&ab_channel=TopTipBio
Effect Size
12
To investigate the magnitude of the difference, please conduct Effect Size Analysis.
13
Example:
Weight of 100 individuals is measured: 50 women (group A) and 50 men (group B). We want to
know if the mean weight of women (mA) is significantly different from that of men (mB).
Results:
Since p-value = 0.01327, H0 is rejected and there is significant difference between the weight of
men and women.
Performing on Excels:
https://www.youtube.com/watch?v=kmww0EewIp0&list=PLEDQSOItvrBat1jK4QtWaWptH
Wu-wVzW_&index=2&t=8s&ab_channel=DavidDunaetz
Effect Size
To investigate the magnitude of the difference, please conduct Effect Size Analysis.
14
Example 1: the max vertical jump of college basketball players is measured before and after
participating in a training program.
● If p-value <0.05, then we can conclude that there are significant differences in max
vertical jump before and after the training program.
● If p-value >0.05, then we can conclude that the training program makes no significant
difference to the max vertical jump, that any improvement or difference is a result of
randomness/chance.
Performing on Excels
https://www.youtube.com/watch?v=N2Rusw-xBIw&ab_channel=SocratGhadban
Effect Size
To investigate the magnitude of the difference, please conduct Effect Size Analysis.
15
16
● If p<0.05, reject the null hypothesis → find out which groups have difference
● If p>0.05, accept the null hypothesis → END
Example 1:
Example 2:
Test if there’s a significant difference in SO2 concentrations at different times in a construction
site.
ANOVA results:
17
● Site A: p>0.05, the null hypothesis is accepted and there is no significant difference
● Site B: p<0.05, the null hypothesis is rejected and there is a significant difference
Example 3:
Test if there’s a significant difference in tree heights in 3 species
As p value is 0.000 < 0.05, there is significant difference in tree height between at least two
groups. Post-hoc tests need to be conducted to find out which pair of tree species have
significant differences.
Conducting on Excel:
https://www.youtube.com/watch?v=ZvfO7-J5u34&list=PLEDQSOItvrBat1jK4QtWaWptHWu-wVz
W_&index=5&t=6s&ab_channel=TopTipBio
Performing on Excels
https://www.youtube.com/watch?v=YbX-JUqD1so&list=PLEDQSOItvrBat1jK4QtWaWptH
Wu-wVzW_&index=4&t=6s&ab_channel=VincentStevenson
Effect size
To investigate the magnitude of the difference, please conduct Effect Size Analysis.
19
Effect Size
Further reading (why is p-value not enough): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/
Upon knowing the significant difference between sample means, effect size can be calculated to
tell the magnitude of experimental impacts.
An effect size of 1 indicates the two groups differ by 1 standard deviation, a d of 2 indicates they
differ by 2 standard deviations, and so on.
Example 2: ANOVA
The mean scores of students who study at different times are collected. Upon ANOVA, results
show that there are significant differences in the scores for students who study at different
times.
[Mean score studying at time A] - [Mean score of control group] / (standard deviation of whole
population)
Based on the effect size, we can conclude that effects of grades are:
That time A has made the most positive impacts on mean scores while time C has made the
most negative impacts.
Conducting on Excel
https://www.youtube.com/watch?v=zUmQ2PZZRJ4&ab_channel=TopTipBio