Professional Documents
Culture Documents
IB Statistics Handbook
IB Statistics Handbook
HANDBOOK
Mag Karl Schauer BSc
xkcd.com
IB STATISTICS HANDBOOK
2
IB STATISTICS HANDBOOK
3
IB STATISTICS HANDBOOK
WHY STATISTICS?
An academic investigation is a way to try to answer a question. This question must be
defined, and a method determined to collect appropriate data. Predictions are then made
based on the knowledge gained by answering previous questions in previous
investigations. So where do statistics come in? Statistics are the tool you need to boil
down all of your carefully collected data into a clear answer. Importantly, they also tell you
how sure you can be of that answer.
W H AT S H O U L D YO U
KNOW?
In order to complete your internally assessed work or data-based extended essays in the IB,
you will need to apply some basic statistics. You will need to summarise and describe your
data using descriptive statistics like averages and standard deviations. Then you will need
to present your data in tables and graphs. Finally, depending on the investigation, you may
need to perform a hypothesis test or other calculations to definitively answer your
question. You won’t generally be expected to do the sometimes complicated calculations
by hand. Tools like spreadsheet software (Excel, LibreOffice etc.) or your Tinspire
handheld make many calculations a trivial matter of entering numbers. You can and will
likely need to look up tutorials for using your software online, as there are many different
softwares and platforms, but there are many tutorials readily available (see this list of
resources for help). What is trickier, and mostly up to you, is deciding what statistics to
apply in what circumstances, and understanding what those calculations tell you. This
handbook should help you with those decisions and that understanding.
This handbook is intended to be used digitally, and contains some cross-referencing and
external links. Underlined text, as well as the table of contents can be clicked to take you
where you want to go.
4
IB STATISTICS HANDBOOK
T YPES OF DATA
You might encounter all types of data in your investigations. It is important to distinguish
between a few different types of data because not all statistical techniques work with all
types of data.
C ATEGORIC AL DATA
This type of data fits into defined categories. For example: red, green and blue as options
for people’s favourite colour are categories.
ORDINAL DATA
This is similar to categorical data, but there is a clear order of the groups. For example:
low, medium and high income categories. These categories don’t always or necessarily
have the same distances between them.
NUMERIC AL DATA
This type of data includes measurements of all kinds. There is a clear order, as in ordinal
data, but the distances between data points are clearly defined. Length, mass and speed are
all numerical data.
FREQUENCY
When a statistician uses the word frequency, they generally mean a count of the number of
things. ‘What is the frequency of…’ can usually be translated to ‘ how many…’
OTHER DATA
This list is certainly not complete. There are other specialty types of data that you might
encounter, but these should be sufficient for most of your investigations.
5
IB STATISTICS HANDBOOK
SAMPLING TECHNIQUES
Since you will never have enough time or resources to measure all of the possible data
points in the population (and if you use statistics, you shouldn’t need to), you will always
only measure a small portion of all of the possible points called a sample. But which data
points should go into the sample? In order to have a fair test, it is important that each
possible data point is equally likely to be chosen for the sample. That is to say, there
should be no sampling bias. In order to do this, you will need to use a sampling strategy
that fits your investigation.
RANDOM SAMPLING
Just to be clear, it is not sufficient to claim that a sample is random if you have simply
chosen ‘at random’ places to measure. You might have a subconscious bias for certain
measurements. To be truly random, you will need to assign a number to each possible data
point, and use a random number generator to tell you which measurements to collect.
One way to do this is with the RAND() function in spreadsheet software. Simply enter
‘=RAND()’ in a cell, and it will show you a random number between 0 and 1. You can
then multiply this number by whatever you need to in order to have a random number
between 0 and that number. For example: if you wanted a random number between 0 and
100, you could simply type ‘=RAND()*100’ into a cell.
SYSTEMATIC SAMPLING
In this technique, you simply choose to sample at regular intervals. For example, you
might choose to make a measurement every ten meters on a transect.
STRATIFIED SAMPLING
This more complicated sampling method is only used if the population is made up of
different sub-sets that make up different proportions of the whole. It might be important
to make sure that one sub-set isn’t being over- or under-represented in the data. This is
most commonly used with survey data.
6
IB STATISTICS HANDBOOK
DESCRIPTIVE STATISTICS
Once you have collected your data, you will need to boil it down. Descriptive statistics,
sometimes called summary statistics, do just that: they help your reader see the general
trends and patterns in your data.
AV E R AG E S
MEAN
This is generally what is meant when someone says ‘average’. It is the sum of the values
divided by the number of values. This is the most common way to summarise sample data.
MEDIAN
This is the ‘middle’ data point. There are just as many data points higher and lower than
this one in the sample. This measure of average is more likely to be used if the sample is
distributed in a strange way, or if outliers might strongly affect the mean. For example, in a
sample of measuring personal wealth, one or two billionaires in the sample might heavily
skew the mean to show an average income that is higher than almost every single
individual. In this case it would be appropriate to use median income to better represent
the sample.
MODE
Mode is much less commonly used. It is the ‘most common’ data point. Or in other words,
the data point with the highest frequency.
1 , 3, 3, 5, 8, 11, 12
7
IB STATISTICS HANDBOOK
STANDARD DEVIATION
Standard deviation is the average of the differences between each data point and the mean.
A large standard deviation, relative to the size of the measurement, means that the data is
very spread out. This means that there are generally large differences between data points.
A small standard deviation, relative to the size of the measurements, indicates that the
measurements are close together, or that they ‘agree’.
Standard deviations are useful for numerical data. If the data is not numerical, or is not
normal, you will need a different way to show the spread of the data.
1. Arrange the data in rank order and divided into four equal parts, each containing an
equal number of values. Each section is called a quartile. The quartile containing
the highest values is the upper quartile, while the one with the lowest values is the
lower quartile.
2. The Upper Quartile Value (UQV) or Q1 is the mean of the lowest value in the
upper quartile and the highest value in the quartile below it.
3. The Lower Quartile Value (LQV) or Q3 is the mean of the highest value in the
lower quartile and the lowest value in the quartile above it.
4. The Inter-Quartile Range (IQR) is the difference between both values calculated in
#2 and #3. A high IQR means the data is very dispersed, while a low IQR means the
data is less dispersed.
8
IB STATISTICS HANDBOOK
For example:
1 2 3 4 5 6 7 8 9 10 11
Q3 Q2 Q1
IQR = Q1- Q3=8.5-3.5 = 5
HISTOGRAMS
A histogram is a way to visualise data in a sample.
It is essentially a bar chart with categories for the
measured values (for example 1-10, 11-20, 21-30)
on the x-axis and frequency (the number of data
Frequency
points in that category) on the y-axis.
Here is a deeper explanation of histograms and how they are made by hand:
https://youtu.be/4eLJGG2Ad30
Here is a longer explanation of the different shapes you might encounter in histograms:
https://youtu.be/Y53_8WRrPzg
9
IB STATISTICS HANDBOOK
The standard, ‘bell’, or gaussian curve The shaded areas under the curve
shows the pattern of how normally represent the proportion of data points
distributed data spreads around the that will likely be found in this section of
mean. If your histogram looks like this, the curve. The x-axis shows standard
your data is probably normally deviation distances with the mean at 0.
distributed.
The area under the normal curve shows how many data points are likely to be found in any
given range. 68% of data points will, on average, be within one standard deviation of the
mean, and 95% will fall within 2 standard deviations. This is helpful in predicting
probabilities, and this type of math is the basis for hypothesis tests.
The normal distribution curve is one of many used in statistics, but it is the most common
shape you will likely encounter.
If your data appear to be normally distributed; that is, your histogram appears to have a
normal curve shape, then you may be able to use some hypothesis tests that require
normal data as a prerequisite.
10
IB STATISTICS HANDBOOK
DATA TABLES
Once you have boiled your data down into some tangible values, you will need to present
the raw data and your descriptive statistics in well-organised tables. Designing data tables
is an art form all its own. A few points might help you make yours beautiful.
TITLE
Be sure that each table has a meaningful and descriptive title (not just ‘table 1’). With
multiple tables, it is usually a good idea to number them (hint: check that your numbers
are right before you hand in a draft!) so that you can refer to them easily in your text.
LABELS
Your data columns need proper labels including:
DATA
The data itself should have the correct number of significant figures to reflect the precision
of the data (see these links for help with significant figures). Be careful not to show more
precision (more significant figures) in an average. You will likely need to format the cells of
your table to show the appropriate number of digits, since the trailing zeros disappear
otherwise. If you have very large or very small values, simply use scientific notation.
SUMMARY STATISTICS
You may want to include your averages and standard deviations right in the table with your
raw data. If you have a lot of data, or it is relatively complex, you might want to create a
separate data table of your summary statistics. You should use whatever you think will
help your reader see the data best.
FORMATTING
It is usually a good idea, if possible, to present your data table on one page. Having the
first half of a table at the end of one page, with the last half continuing on the next makes
it very hard to get an overview of the data. Also, try to size your columns carefully to fit
them on the page, but not to muddle the titles.
11
IB STATISTICS HANDBOOK
Table 1: The height of 15 Z. mays plants after growing for 30 days at different fertiliser
concentrations in three different field sites.
Concentration
of fertiliser in Standard
Field site 1 Field site 2 Field site 3 Average
soil (+/- 0.10 deviation
mg/kg)
12
IB STATISTICS HANDBOOK
GRAPHICAL TECHNIQUES
Once you have presented your data in tables, you will need to make it more readily visible
to your reader. It is important to choose the right graph for the type of data you are
presenting. The formatting and labelling of the graph is also important. If done well,
graphs should show the reader the answer to your research question at a glance.
BAR GRAPHS
Bar graphs are used to represent numerical data (y-axis) from different categories (x-axis).
Bar graphs of averages should have error bars showing standard deviation, or some other
measure of spread. Somewhere on the graph or in its caption you need to declare what the
error bars represent. Be sure that you are using the unique standard deviation values that
Figure 1: The average growth of cress seeds after growing for 4 days under
different coloured light. The error bars represent one standard deviation.
40
Average growth of cress plants after 4 days
30
(+/-1mm)
20
10
0
Red Blue Green Yellow Orange
Color of light applied to growing plants
13
IB STATISTICS HANDBOOK
you calculated in your tables, and not the automatic values that some softwares apply
(incorrectly).
LINE GRAPHS
Line graphs and scatter plots are often confused for each other. Line graphs show straight
lines connecting the dots of the data points. This is to represent the fact that line graphs
show multiple measurements of the same thing. The straight line is an assumed linear
change of that measured value between measurements. Therefore, only use a line graph if
you are tracking the change of something.
14.42
Average global temperaure (+/-0.01°C)
14.34
14.26
14.18
14.10
14.02
13.94
13.86
13.78
13.70
1870 1890 1910 1930 1950 1970 1990 2010
Year
If you graph average values, you will need error bars to show the spread of the data (see
bar graphs). Be sure, however, to use all data points, not just averages, to calculate an R2
value.
14
IB STATISTICS HANDBOOK
Figure 1: This graph shows a strong positive correlation between the height and
age of the sampled trees.
32
y = 0.7334x - 2.8792
R² = 0.9189
25.6
Height of tree (+/-2m)
19.2
12.8
6.4
0
0 10 20 30 40 50
Age of tree (+/-1year)
CORREL ATION
A correlation is a relationship between two numerical variables. A correlation can be
positive or negative:
• Positive correlation: As the independent variable increases, the dependent variable also
increases.
• Negative correlation: As the independent variable increases the dependent variable
decreases.
15
IB STATISTICS HANDBOOK
The line of best fit or trend line is chosen by the computer to be as close as possible to all
of the data points. It is an approximation of the linear trend in the data. The closer all of
the data points are to the trend line, the stronger the correlation.
In order to make claims about a linear correlation, it is important that the data show a
linear trend. If the data are not linear, or not expected to be linear, it is not appropriate to
compare them to a trend line! Here are some examples of when linear regression is not
appropriate:
16
IB STATISTICS HANDBOOK
50
Enzyme activity
40
30 This shape of a graph should not
20 be compared to a line.
10
0
0 20 40 60 80
Temperature (+/-1°C)
• The rate of a chemical reaction slows over time as substrate is used up. The shape of the
curve produced is predictable and depends on the type of reaction. The graph of the
concentration of the product over time might look something like this:
50
product (+/- 0.1M)
Concentration of
40
This shape of a graph should not be
30
compared to a line. Instead, a Spearman
20
Rank test can be performed.
10
0
0 30 60 90 120
Time (+/-1s)
17
IB STATISTICS HANDBOOK
Interpolation is using the trend line to predict values within the range of your data.
Extrapolation is expanding the trend line beyond the data to make predictions outside of
the range of data. The further the predicted value is from the measured values, the less
reliable the extrapolated value will be.
3.00
y = -7.1143x + 2.073
You must use all of the
R² = 0.9659
Change in mass of potato tissue after 1h
0.00
-1.00
0.29
-2.00
0.00 0.10 0.20 0.30 0.40 0.50
Concentration of sucrose solution (+/-0.01M)
18
IB STATISTICS HANDBOOK
The high R2 value of 0.97 shows that the data have a strong linear correlation. The
negative slope value of -7.1143 shows that the relationship is a negative correlation.
Because the trend is very strong, the equation for the line can be used to make predictions.
You were asked to determine what concentration would be isotonic to the potato tissue,
that is, at what concentration no net osmosis would occur. At this concentration the
change in the mass of the potato tissue would be zero. To find the corresponding
concentration, you can substitute zero for y (the change in mass) in the trend line
equation, and solve for x (the concentration):
−2.073
−2.073 = − 7.1143x → =x
−7.1143
x = 0.2914
This value of 0.29 M is where the trend line crosses the x axis, and it is the osmolarity, or
isotonic concentration of the potato tissue. It must be rounded to reflect the precision of
the measurements that were used to calculate it. This process could be repeated to predict
any given change in mass or any concentration within the range of the data.
OTHER GRAPHS
Though the graphs listed above are the most likely you will need, there are of course many
other types of graphs. Here are two others that you might consider:
Pie charts show the breakdown of a group into its parts, usually percentages. The
percentages should add up to 100. Avoid too many categories, as the chart can quickly
become difficult to read.
Radar charts can be used to show many different attributes at once, and compare these
between locations or individuals.
19
IB STATISTICS HANDBOOK
HYPOTHESIS TESTING
The goal of an experiment or investigation is to answer a specific question. The data
should make it clear what the answer to that question is. Often, due to the uncertainty
inherent in data, the answer may not be entirely clear. It may look like there is a difference
between two groups, but the difference might only be due to chance. It may appear that
there is a correlation between two variables, but the sample may have been a fluke.
Hypothesis testing allows you to determine how sure you are of the answer, and the
likelihood of the observed pattern being due to chance.
A hypothesis test requires that you make an assumption, and calculate the probability of
this assumption being true. This assumption is called the null hypothesis, H0. If this null
hypothesis assumption can be shown to be very unlikely, then you can conclude instead
that the alternative hypothesis, HA, is true. Despite the naming, these hypotheses are
different than your experimental hypothesis, that is, your reasoning about what you think
will happen in your experiment. You always need to declare and explain an experimental
hypothesis in the exploration portion of your work. You only need to declare null and
alternative hypotheses in the context of your hypothesis test, if you choose to use one.
This should be included in your explanation of the data analysis.
A hypothesis test generates a test statistic. The value of this test statistic gives you
information about the likelihood of the null hypothesis being true. That test statistic can
then be compared to table of critical values that it must be higher or lower than in order to
conclude a statistically significant result.
Usually this process is simplified, and a p-value can be calculated based on the test
statistic. The p of p-value stands for probability, and it is the probability of the null
hypothesis being true given your data. It is always a value between 0 and 1 (i.e. 0 and
100%) . If the p-value is low enough, then it is very unlikely that the null hypothesis is
true, and it can safely be rejected. When the null hypothesis is rejected, the alternative
hypothesis can be concluded, and there is a statistically significant result.
The p-value is compared to the alpha value. For our intents and purposes you will use an
alpha value of 0.05. This is the threshold probability below which you determine that the
null hypothesis is too unlikely. That is, if the probability of the null hypothesis being true
(p-value) is less than 5%, then you should conclude that it is too unlikely to be reasonable
and reject the null hypothesis. For an example of this process, read the section on t-
testing.
If the p-value is above the alpha threshold of 0.05, then you must ‘fail to reject the null
hypothesis’. This is different than accepting the null hypothesis! You don’t have enough
20
IB STATISTICS HANDBOOK
evidence to conclude that the null hypothesis is true. Instead you simply ‘fail to reject’ and
conclude that you cannot be sure whether the observed result is due to random chance or
a real effect.
For example, if a test gives a p-value of 0.2, there is a 20% chance that the null hypothesis
is true given your data. It would not be reasonable to conclude, then, that the null
hypothesis is true, as there is only a 20% chance of this being the case. You also cannot
rule it out entirely, since 20% is a significantly high likelihood. Therefore you simply ‘fail
to reject the null hypothesis’.
The t-test is used when the data can be assumed to be normal and the sample sizes are
relatively large (more than 10 measurements). It might be a good idea to make a
histogram to see if the data appear to be normal, but at the very least you should state that
you assume the data to be normally distributed, and why you think it is.
If the assumptions for normality are met for the t-test, but you have more than two
groups, you will need to perform an ANOVA (analysis of variance) test to see if the
variability between the groups is due to chance or some real effect.
If it is not safe to assume that the data are normally distributed, you have small samples,
or your data are ordinal, but not numerical, then you can make a comparison between two
groups using the Mann-Whitney U Test instead. This test is less likely to find a difference
if there is one, but it is safer to use if the prerequisites for a t-test are unclear or not met.
THE T-TEST
The t-test assumes a null hypothesis that there is no significant difference between the
groups (any observed difference is due to chance), then calculates the probability of that
hypothesis given your data.
HA: The observed difference between the groups is statistically significant, and not
likely due to chance.
21
IB STATISTICS HANDBOOK
Suppose you want to find out if dandelions (T. officinale) grow to different heights in two
different types of soil. In your experiment you measure the average growth of the
dandelions in each of two soil types.
Soil a Soil b
8.0 15.5
9.1 14.5
13.2 12.2
12.0 10.1
12.8
6.3 15.0
10.0 12.1
11.0 13.2
8.5
12.1 16.0
9.8 14.2
8.5 13.1
4.3
12.2 9.9
9.7 17.8
10.1 10.3
0.0
Soil a Soil b 13.2 16.4
10.3 19.0
Standard
2.1 2.7
deviation
You notice a difference between the groups. The plants in soil b have grown taller on
average than the plants in soil a. Since the error bars are overlapping, it is hard to say
whether this observed difference is due to chance, or whether the two are really different.
Therefore you decide to perform a t-test to find out.
First, you need to determine whether the data are normally distributed, so you make a
histogram to see.
22
IB STATISTICS HANDBOOK
Soil a Soil b
5
4
Frequency
0
6.0-7.9 8.0-9.9 10.0-11.9 12.0-13.9 14.0-15.9 16.0-17.9 18.0-19.9
Height of plants (+/-0.1cm)
Because the data appear to be normally distributed, and the sample sizes are sufficiently
large (n=16), you can proceed with the t-test.
Using the T.TEST() function of your spreadsheet software, you enter the following
information:
The T.TEST() function of your spreadsheet software returns the p-value for the test. The p-
value is the the probability of H0 being true given your data.
https://youtu.be/DPNUpldVC4M
As the p-value decreases, the likelihood of the difference being due to random chance also
decreases. Eventually, the p-value is so small, that it is no longer reasonable to assume that
the null hypothesis is true, and therefore the null hypothesis can be rejected. The most
commonly used threshold level (also called the alpha value) for rejecting the p-value is
23
IB STATISTICS HANDBOOK
0.05. That means that if the probability of the null hypothesis being true sinks below 0.05
(5%), then it can be rejected and the alternative hypothesis accepted.
In the case of these data the p-value is 0.00008. This is well below the threshold level of
0.05, and therefore the null hypothesis can be rejected. The observed difference between
the two soils is not due to random chance, it is a statistically significant difference.
A N OVA
The ANOVA test should be used if you have more than two groups to compare. Though
you could theoretically perform many t-tests between each of the possible combinations,
this is inefficient, and mathematically risky. Every time you perform a t-test, there is a
small probability that your difference was in fact due to a random fluke, and not a real
difference. If you perform many t-test, the likelihood of making such an error increases.
HA: At least one of the groups appears to be different than the rest. The variability
between groups is not likely to be due to chance.
The ANOVA test produces a p-value that can be interpreted in the same way as in the t-
test. If the p-value is below the threshold of 0.05, then the null hypothesis can be rejected
and the alternative hypothesis concluded. It is not clear from the ANOVA test what groups
are different from each other, but instead that the variability between the groups is not due
to chance.
You can learn how to perform the ANOVA test in Excel or LibreOffice here:
Excel: https://youtu.be/qQSQr_JldyY
LibreOffice: https://youtu.be/TxTKq4W8qX8
24
IB STATISTICS HANDBOOK
Mann-Whitney U similar to the median. This type of hypothesis test that does not rely on
the actual value of the measurements is called a non-parametric test.
The null and alternative hypotheses are the same as for the t-test:
HA: The observed difference between the groups is statistically significant, and not
likely due to chance.
Although you can painstakingly calculate the Mann-Whitney U statistic by hand, with
some help from your spreadsheet software, this is not a requirement of the IB. Instead
simply enter your two sets of data in this online calculator:
https://www.socscistatistics.com/tests/mannwhitney/default2.aspx
The calculator gives you the p-value for the test, which you compare to the alpha threshold
of 0.05 as in the t-test. If the p-value is below 0.05, you can reject the null hypothesis and
conclude that there is a statistically significant difference.
• The data are paired, that is, there are two measurements or values for each data point,
the dependent and independent variables.
25
IB STATISTICS HANDBOOK
If these conditions are met, you can continue with the test.
H0: There is no correlation in the data. The observed trend is due to random chance.
In your spreadsheet software, enter the formula to calculate r, the Pearson correlation
coefficient:
This formula calculates the r value that then needs to be compared to a critical value table
(see critical value tables here).
The critical value table shows what values of r are significant. The strength of the test
depends on the number of data points included (n), so the critical value also changes with
n. Keep in mind that one data point has two measurements. If you were comparing, for
example, height and weight of plants, and measured height and weight of 12 plants, then n
would be 12, not 24. You simply need to compare the r value that you calculated to the
critical value corresponding to the number of data points you used. If the absolute value of
r is greater than the critical value, then the correlation is statistically significant and not
likely to be due to chance.
For example, if your r value was -0.65 and you had 10 data points, you would compare that
r value to the critical value 0.521 from the critical value table and conclude that the
absolute value of r is greater than the critical value. Therefore the correlation is statistically
significant.
The Spearman rank correlation is denoted by the symbol ρ (rho) or rs. Analogous to how
the Mann-Whitney U test compares rank instead of the actual values of the data, the
Spearman rank test determines a correlation in the data by looking at the rank order of the
data instead of its actual values. Use this test instead of the Pearson correlation test if any
of the following are true:
26
IB STATISTICS HANDBOOK
• The data do not appear to have a linear trend, but do trend either positively or
negatively.
The null and alternative hypotheses are the same as for the Pearson correlation test:
H0: There is no correlation in the data. The observed trend is due to random chance.
To calculate the test statistic, simply enter your data in the calculator at this site, and
interpret the p-value as in the other tests:
https://www.socscistatistics.com/tests/spearman/default2.aspx
https://youtu.be/o0VhMWeotFg
N E A R E S T N E I G H B O U R A N A LY S I S
In geography, the nearest neighbour analysis can be used to determine if the spacing
between points is random, clustered, or ordered. First, the data is collected by measuring
27
IB STATISTICS HANDBOOK
the distance between each location (eg. tree) and it’s nearest neighbour. The nearest
neighbour index, (NNI or Rn) is then calculated according to the following formula:
n
NNI = 2D̄
A
The NNI value can indicate that the points are clustered, random, or ordered, depending on
its value:
0 1.0 2.15
n= 9 A = 36m2
28
IB STATISTICS HANDBOOK
Since n= 9, A=36m2, and D̄ = 1.17m The NNI value is therefore calculated as:
n 9
NNI = 2D̄ = 2 ⋅ 1.2 = 1.2
A 36
For n= 9, the critical value table gives 0.713 as the limit below which the points would be
considered clustered, and 1.287 as the upper limit, above which the data would be
considered ordered. You can therefore conclude that the trees are randomly dispersed.
29
IB STATISTICS HANDBOOK
C R I T I C A L VA L U E TA B L E S
Pearson r Nearest neighbour index
Critical
n Critical values
value
2 0.988
3 0.900 n clustered uniform
4 0.805
5 0.729 2 0.392 1.608
3 0.504 1.497
6 0.669
4 0.570 1.430
7 0.622
5 0.616 1.385
8 0.582
6 0.649 1.351
9 0.549 7 0.675 1.325
10 0.521 8 0.696 1.304
11 0.497 9 0.713 1.287
12 0.476 10 0.728 1.272
13 0.458 11 0.741 1.259
14 0.441 12 0.752 1.248
15 0.426 13 0.762 1.239
16 0.412 14 0.770 1.230
17 0.400 15 0.778 1.222
18 0.389 16 0.785 1.215
19 0.378 17 0.792 1.209
20 0.369 18 0.797 1.203
21 0.360 19 0.803 1.197
20 0.808 1.192
26 0.323
21 0.812 1.188
31 0.296
22 0.817 1.183
36 0.275
23 0.821 1.179
41 0.257 24 0.825 1.176
46 0.243 25 0.828 1.172
51 0.231 26 0.831 1.169
61 0.211 27 0.835 1.166
71 0.195 28 0.838 1.163
81 0.183 29 0.840 1.160
91 0.173 30 0.843 1.157
101 0.164 31 0.846 1.155
32 0.848 1.152
33 0.850 1.150
34 0.853 1.148
35 0.855 1.145
36 0.857 1.143
37 0.859 1.141
38 0.861 1.140
39 0.862 1.138
40 0.864 1.136
41 0.866 1.134
42 0.867 1.133
43 0.869 1.131
44 0.870 1.130
45 0.872 1.128
50 0.878 1.122
60 0.889 1.111
70 0.897 1.103
80 0.904 1.096
90 0.909 1.091
100 0.914 1.086
30
IB STATISTICS HANDBOOK
HOW TO USE
SPREADSHEET SOFT WARE
You will need to spend some time learning how to use your brand of software on your
platform, as they all differ somewhat. Excel©(subscription based) and LibreOffice©
(freeware) are both good options, but you could also use Numbers© on MacOS, or Google©
Sheets, though the latter has some significant limitations. Many of these calculations can
also be performed on a Tinspire© handheld. Searching the web, or using your software’s
help function will usually yield quick answers to tricky problems. Here are some tips and
resources that might help you on your way:
Tip: Be sure you know whether your software expects a decimal ( . ) or a comma ( , ) as a
separator. If you use the wrong one, the computer does not recognise your data as
numbers, but instead treats it as text which causes all calculations to fail. Use the 'search
and replace’ function of your software to change all of them at once.
Tip: Use this site to find the appropriate function in your software’s language:
http://www.excelfunctions.eu
The Moodle site 7AB Tabellenkalkulation has guides for performing simple calculations
and making diagrams in Excel© and LibreOffice© here:
https://moodle.tsn.at/course/view.php?id=36089
At Mr. Schauer’s youtube channel you can find a handful of videos on data analysis:
bit.ly/mrschauersyoutube
For more information on calculating quartiles and the inter-quartile range using excel, visit
this site:
https://www.statisticshowto.com/probability-and-statistics/interquartile-range/
#IQRExcel
31
IB STATISTICS HANDBOOK
For lots of in-depth information on the geography Internal Assessment, visit these pages:
https://www.thinkib.net/geography/page/22606/ia-student-guide
https://sites.google.com/site/geographyfais/fieldwork
For more help with biological statistics for the IA, visit this site:
https://www.biologyforlife.com/statistics.html
For a very useful handbook of basic statistics, look for a copy of this book:
Methods of Statistical Analysis of Fieldwork Data. St. John, P. and Richardson, D.A.
Geographical association 1996.
32