Professional Documents
Culture Documents
Module 1 - Intro and Presenting Data - 1pp
Module 1 - Intro and Presenting Data - 1pp
Foundations of Biostatistics
Timothy Dobbins
Hybrid learning
This class is includes students in the lecture theatre and online
If you are an online student, please turn on your camera when asking
questions, presenting or joining the discussion.
https://www.abs.gov.au/ausstats/abs@.nsf/lookup/1307.6feature+article1mar+2009
Scope of statistics: descriptive vs inferential
Descriptive: describe characteristics of a population
https://www.aihw.gov.au/reports/mothers-babies/australias-mothers-babies/contents/demographics-of-mothers-and-babies/maternal-age
Scope of statistics: descriptive vs inferential
Descriptive: describe characteristics of a population
https://www.aihw.gov.au/reports/mothers-babies/australias-mothers-babies/contents/demographics-of-mothers-and-babies/key-demographics-and-statistics
Scope of statistics: descriptive vs inferential
Descriptive: describe characteristics of a population
https://www.aihw.gov.au/reports/mothers-babies/australias-mothers-babies/contents/demographics-of-mothers-and-babies/key-demographics-and-statistics
Scope of statistics: descriptive vs inferential
Inferential: use a sample of data from the population to make inferences about
the whole population
http://www.healthstats.nsw.gov.au
Types of variables
Observations and variables
Kirkwood and Sterne:
• The raw data of an investigation consist of observations made on individuals
• Any aspect of an individual that is measured ... is called a variable
Types of variables
Numeric (or quantitative)
• continuous (e.g. weight, haemoglobin)
• discrete (e.g. number of children)
Categorical (or qualitative)
• binary (e.g. previous heart disease: yes/no)
• categorical (e.g. status: alive, died from cancer, died from other causes)
• ordered categorical (ordinal) (e.g. cancer stage, grade)
Observations and variables
https://www.abs.gov.au/statistics/standards/standard-sex-gender-variations-sex-characteristics-and-sexual-orientation-variables/latest-release
Suggested questions
What was your sex recorded at birth?
o Male
o Female
o Another term (please specify)
https://www.abs.gov.au/statistics/standards/standard-sex-gender-variations-sex-characteristics-and-sexual-orientation-variables/latest-release
Suggested questions
How do you describe your gender?
Gender refers to current gender, which may be different to sex recorded at birth
and may be different to what is indicated on legal documents.
https://www.abs.gov.au/statistics/standards/standard-sex-gender-variations-sex-characteristics-and-sexual-orientation-variables/latest-release
Presenting numerical data
Research as storytelling
Tell the reader a simple story
Keep the reader engaged
https://www.amazon.com/Gruffalo-Julia-Donaldson/dp/0803730470
https://www.amazon.com/Matilda-Roald-Dahl/dp/0142410373
https://www.amazon.com.au/Testaments-Handmaids-Tale-Book-ebook/dp/B07KRMV57
Research as storytelling
Tell the reader a simple story
Keep the reader engaged
Present evidence to support your story
Evidence is in the form of tables and figures
Easy to read tables and figures keep readers engaged
Constructing tables
One-way frequency tables
Summarises a single characteristic (e.g. age)
Frequency: the number of individuals with a certain characteristic
Relative frequency: the frequency expressed as a percentage or a proportion of
the total frequency
One-way frequency tables
17 1 3
18 5 17
19 5 17
20 7 23
21 5 17
22 2 7
23 4 13
24 1 3
Total 30 100
One-way frequency tables
Cumulative frequency: the number of individuals in a category or below
Cumulative relative frequency: the cumulative frequency expressed as a
percentage or a proportion of the total frequency
One-way frequency tables
BMI Status
Normal Overweight Obese
Sex Total
BMI < 25 kg/m2 25 ≤ BMI < 30 kg/m2 BMI ≥ 30 kg/m2
Male 1 9 2 12
Female 11 6 0 17
Total 12 15 2 29
Note: 1 value of BMI was missing
Two-way frequency tables:
column percentages
BMI Status
Normal Overweight Obese
Sex Total
BMI < 25 kg/m2 25 ≤ BMI < 30 kg/m2 BMI ≥ 30 kg/m2
n % n % n % n %
Male 1 8 9 60 2 100 12 41
Female 11 92 6 40 0 0 17 59
Total 12 100 15 100 2 100 29 100
Note: 1 value of BMI was missing
Two-way frequency tables:
row percentages
BMI Status
Normal Overweight Obese
Sex Total
BMI < 25 kg/m2 25 ≤ BMI < 30 kg/m2 BMI ≥ 30 kg/m2
Male n 1 9 2 12
% 8 75 17 100
Female n 11 6 0 17
% 65 35 0 100
Total n 12 15 2 29
% 41 52 7 100
Note: 1 value of BMI was missing
Multi-way frequency tables
Australian Institute of Health and Welfare 2015. The health of Australia’s prisoners 2015. Cat. no. PHE 207. Canberra: AIHW
Table presentation guidelines
1. Each table (and figure) should be self-explanatory, i.e. the reader should be
able to understand it without reference to the text in the body of the report
2. Units of the variables should be given and missing records should be noted
3. A table should be visually uncluttered
From Woodward. Epidemiology: Study Design and Data Analysis, Third Edition; 2013
Table presentation guidelines
4. The rows and columns of each table should be arranged in a natural order to
help interpretation
5. Tables should have a consistent appearance throughout the report
6. Consider if there is a particular table orientation that makes a table easier to
read
From Woodward. Epidemiology: Study Design and Data Analysis, Third Edition; 2013
Presenting data graphically
Bar chart
Simple way to plot frequencies
Bars represent frequency
Horizontal (x) axis is categorical
Source: Australian Institute of Health and Welfare 2019. Cancer in Australia 2019. Cancer series no.119. Cat. no. CAN 123. Canberra: AIHW.
Clustered bar chart
Source: https://www.aihw.gov.au/reports/burden-of-disease/australian-burden-of-disease-study-impact-and-causes-of-illness-and-death-in-australia-2011/contents/highlights
Stacked bar chart
Source: Australian Institute of Health and Welfare 2017. Australia’s welfare 2017. Australia’s welfare series no. 13. AUS 214. Canberra: AIHW.
Stacked bar chart
Source: State of Australian University Research 2015–16: Volume 1 ERA National Report (ERA 2015)
Line chart
Source: Australian Institute of Health and Welfare 2016. Australia’s health 2016. Australia’s health series no. 15. Cat. no. AUS 199. Canberra: AIHW.
Pie charts
Source: https://budget.gov.au/2019-20/content/overview.htm
Pie charts
Many authors categorically reject pie charts ... Others
defend the use of pie charts in some applications. My
own opinion is that none of these visualizations is
consistently superior over any other. Depending on
the features of the dataset and the specific story you
want to tell, you may want to favor one or the other
approach.
https://serialmentor.com/dataviz/visualizing-
proportions.html#a-case-for-pie-charts
Pie charts
Pie charts are evil
I have a well‐documented disdain for pie charts. In short, they
are evil. To understand how I arrived at this conclusion, let’s
look at an example.
Graphical presentation guidelines
From Woodward. Epidemiology: Study Design and Data Analysis, Third Edition; 2013
Graphical presentation guidelines
5. If the Y-axis has a natural origin, it should be included, or emphasised if it is
not included.
6. If graphs are being compared, the Y-axis should be the same across the
graphs to enable fair comparison
7. Columns of bar charts should be separated by a space
8. Three dimensional graphs should be avoided unless the third dimension
adds additional information
From Woodward. Epidemiology: Study Design and Data Analysis, Third Edition; 2013
Computing summary statistics
What do we want to know?
What is the average value?
What is the spread, or variability?
Example data
Weight (in kgs) of 30 people:
∑𝑥
𝑥̅ =
𝑛
Weights example:
∑ 𝑥 = 2100
𝑥̅ = 2100 / 30 = 70.0 kg
Advantages
• uses all data points
• nice mathematical properties
Disadvantage
• affected by unusually small or large observations
Measures of central tendency
2: Median
Example (n=30):
60.0 62.5 62.5 62.5 65.0
65.0 65.0 67.5 67.5 67.5
67.5 67.5 70.0 70.0 70.0
70.0 70.0 70.0 72.5 72.5
72.5 72.5 75.0 75.0 75.0
75.0 75.0 77.5 77.5 80.0
Advantage
• not unduly affected by unusually small or large observations
Disadvantages
• difficult mathematical properties
• ‘wastes’ information
Measures of central tendency
Mean vs Median
Minimum to maximum
or
Maximum − minimum
Disadvantages
• wasteful: based on only two observations
• based on the two most unusual observations
Measures of variability
1: Range
x 60.0 62.5 62.5 62.5 65.0 ... 75.0 75.0 77.5 77.5 80.0
x̅ 70.0 70.0 70.0 70.0 70.0 ... 70.0 70.0 70.0 70.0 70.0
d –10.0 –7.5 –7.5 –7.5 –5.0 5.0 5.0 7.5 7.5 10.0
x 60.0 62.5 62.5 62.5 65.0 ... 75.0 75.0 77.5 77.5 80.0
x̅ 70.0 70.0 70.0 70.0 70.0 ... 70.0 70.0 70.0 70.0 70.0
d –10.0 –7.5 –7.5 –7.5 –5.0 5.0 5.0 7.5 7.5 10.0
d2 100.0 56.3 56.3 56.3 25.0 ... 25.0 25.0 56.3 56.3 100.0
Measures of variability
2: Standard deviation
The interquartile range (IQR) represents the range within which the middle 50%
of observations lie: i.e. Q1 to Q3
Weights data: interquartile range is 67.5 to 75.0 kg
∑$
Population mean: 𝜇 = )
! ∑ $%* !
Population variance: 𝜎 =
)
Population standard deviation: 𝜎 = 𝜎!
Population vs sample statistics
∑$
Population mean: 𝜇 = )
! ∑ $%* !
Population variance: 𝜎 =
)
Population standard deviation: 𝜎 = 𝜎!
Maximum*
Q3 or 75th percentile
Q1 or 25th percentile
Minimum*
"Outlier"
Largest "non-outlier"
Q3 or 75th percentile
Median
(Q2 or 50th percentile) Interquartile range
Smallest "non-outlier"
Q1 or 25th percentile
"Outlier"
Graphing continuous data: Box-plot
Stata defines an outlier as:
• any observation larger than Q3 + 1.5 × IQR or
• any observation smaller than Q1 – 1.5 × IQR
Do not automatically assume these are incorrect
Check for biological plausibility
Figure 1.10: Box-plots
Reporting results
Summary statistics
Report units of measurement
Don’t forget to report the unit of measure for all your summary statistics
Example weight data:
• Mean = 70.0 kg
• Median = 70.0 kg
• Mode = 70.0 kg
• Range = 60.0 to 80.0kg (= 20 kg)
• IQR = 67.5 to 75.0 kg (= 7.5 kg)
• Standard deviation = 5.04 kg
• Variance = 25.43 kg2
Reporting results: decimal places
In the presentation of results:
• Range, median and interquartile range are based on observed data points, so quote to the
same number of decimal places as the original data
• Mean may be quoted with one more decimal places than the original data
• Variance, standard deviations or standard errors may be quoted to one extra decimal place
than the mean
Do not give greater precision than can be measured by the instrument used to
collect the information
Reporting results: decimal places
The precision of percentages depends on the number of observations in your
sample
For samples of fewer than 100, present no decimal places
For samples of 100 or more, present no more than one decimal place
Rounding decimal places
All decimal points should be retained in intermediate calculations
Rounding should only be carried out at the end of the analysis
• Use the memory function on your calculator
Numbers should always be rounded, not truncated.
E.g. 0.015782 expressed to:
2 decimal places is 0.02:
0.015782
but the first digit after the second decimal place is ≥ 5,
so the 1 gets rounded up
Rounding decimal places
All decimal points should be retained in intermediate calculations
Rounding should only be carried out at the end of the analysis
• Use the memory function on your calculator
Numbers should always be rounded, not truncated.
E.g. 0.015782 expressed to:
2 decimal places is 0.02
3 decimal places is 0.016
4 decimal places is 0.0158
5 decimal places is 0.01578
Research as storytelling
Tell the reader a simple story
Keep the reader engaged
Present evidence to support your story
Evidence is in the form of tables and figures
Easy to read tables and figures keep readers engaged
Summary
• Descriptive vs inferential statistics
• Important to identify the type of a variable
• Appropriate presentation and analysis
• Present and report data numerically
• Introduced different types of graphical summaries
• Compute summary statistics to describe the centre and spread of data
Questions?
Always happy to answer questions
In or after lectures
On Moodle boards
• Please do not use Moodle messages!
Via email: t.dobbins@unsw.edu.au