Professional Documents
Culture Documents
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
Brian Yuen
15 January 2013
Using appropriate statistics and graphs
Slide - 2
2
Using appropriate statistics and graphs…
Slide - 3
Z=Cat. Z=Cat.
X=Cont.
All these graphs are available in Chart Builder, from the Choose from: list. 3
Flow chart of commonly used Categorical data
descriptive statistics and Frequency
graphical illustrations Percentage (Row, Column or Total)
Continuous data
Graphical illustrations
Histogram (can be plotted against a
categorical variable)
Box & Whisker plot (can be plotted against
a categorical variable)
Dot plot (can be plotted against a
categorical variable)
Scatter plot (two continuous variables)
Choosing appropriate statistical test
Slide - 5
5
Flow chart of
Exposure
commonly used variable Normal Skew
statistical tests
1 group One-sample t test Sign test / Signed rank test
• Objective: any difference between the group average and the published value
• Outcome & type:
• Exposure & type:
• If the continuous outcome is
vs.
– Normally distributed
– Not Normally distributed
9
CONTINUOUS & ORDINAL DATA
Case Study 2 Slide - 10
• A clinical trial investigating:
– the effect of two physiotherapy treatments (standard and enhanced exercise) for patients with
a broken leg
– on their fitness level (length of time walking on a treadmill before stopping through
tiredness)
10
CONTINUOUS & ORDINAL DATA
Case Study 3 Slide - 11
• Now each patient performs the walking test before and after enhanced
physiotherapy treatment
– data might be presented as two variables, one as before data and the other as after
data, but the values for individual patients are paired
• Objective: any difference between the before and the after averages
• Number of outcomes:
• Outcomes & type:
• If the difference in outcomes (e.g. after - before) is
– Normally distributed
– Not Normally distributed
11
CONTINUOUS & ORDINAL DATA
Case Study 4 Slide - 12
• Based on Case Study 2 (standard vs. enhanced exercises), but now with a control
group
– i.e. patients without a broken leg
• Note –
Note –
13
CONTINUOUS & ORDINAL DATA
Case Study 6 Slide - 14
• Before the participants started their fitness test, their blood pressure (BP) was recorded by two different
machines
– machine 1 was the ‘gold standard’
– machine 2 was newly made and claimed to be more accurate
– aim to validate the measurements recorded from machine 2 by assessing the level of agreement with that
obtained from machine 1
15
BINARY DATA
Case Study 7 Slide - 16
• Fitness is now assessed only as Unfit / Fit
– could be as a result of dichotomising the previous continuous outcome (0-5 minutes = Unfit;
>5 minutes = Fit)
– investigate whether the proportions of Unfit and Fit are equal (i.e. 50% each) after the
standard treatment
– or compare the proportions to specific values (e.g. 10% Fit, 90% Unfit)
16
BINARY DATA
Case Study 8 Slide - 17
• Similar setting as Case Study 2, but with the binary outcome defined from Case Study 7
(Unfit / Fit)
– to find out if the enhanced treatment is better than the standard treatment, i.e. more patients
into the Fit category
17
BINARY DATA
Case Study 9 Slide - 18
• Fitness still assessed as Unfit / Fit, but we now have only one group of patients
assessed before and after enhanced physiotherapy
– each patient was measured before and after treatment
– their status in fitness may change
– similar to Case Study 3
After
• Objective: any change in status Before Unfit Fit
• Number of outcomes:
• Outcomes & type: Unfit
18
BINARY DATA
Case Study 10 Slide - 19
• Recall resting blood pressure (BP) was recorded by two different machines (machine 1 and 2)
on our participants from Case Study 6
– the measurements were now categorised as Low BP and High BP
– could be as a result of dichotomising the previous continuous outcome by the default settings
from the two machines
– aim to validate the status recorded from machine 2 by assessing the level of agreement with
that obtained from machine 1
Mac. 2
• Objective: any agreement between measuring
tools Mac. 1 Low High
• Number of outcomes:
Low
• Outcomes & type:
19
SURVIVAL DATA
Case Study 11 Slide - 20
• A clinical trial investigating the survival time of patients with a particular cancer
– patients are being randomised into a number of treatment groups
– they are then monitored until the end of the study
– the length of time between first diagnosis and death is recorded
– some people will still be alive at the end of study and we don’t want to exclude them
• Choice of test:
–
• Note –
20
Comparing a binary outcome between two groups –
data presented as a 2x2 table Slide - 21
21
Percentage of Fit in standard group: 140/220 (63.6%)
Percentage of Fit in enhanced group: 220/240 (91.7%)
Slide - 22
Parameter (95% CI)
Absolute difference in proportions
d/(c+d) - b/(a+b)
28.1% (21%, 35%)*
Relative risk d/(c+d)
Relative risk c/(a+b)
1.44 (1.29, 1.60)
Odds ratio ad
Odds ratio bc
6.29 (3.69, 10.72)
* Asymptotic 95% confidence intervals (calculated in CIA)
95% confidence intervals calculated in SPSS
• Reminder: Report confidence intervals for ALL key parameter estimates
– If 95% confidence interval for a difference excludes 0 statistically significant
e.g. Absolute difference
– If 95% confidence interval for a ratio excludes 1 statistically significant
e.g. Relative risk and Odds ratio
22
Advantages and disadvantages of absolute and
relative changes, and odds ratios Slide - 23
• simplest to calculate and to interpret
Absolute
• when applied to number of subjects in a group gives number of subjects expected to benefit
difference • 1/(absolute difference) gives NNT – ‘number needed to treat’ to see one additional positive response
• intuitively appealing
Relative risk
• a multiplicative effect – proportion (risk) of failure in the treatment group examined relative to (or
compare to) that in the reference group
• different result depending on whether risks of ‘Fit’ or ‘Unfit’ are examined and whether ‘Standard
exercise’ group is selected as the reference level
• natural parameter for cohort studies
23
CONTINUOUS & ORDINAL DATA
Case Study 12 Slide - 24
• Now, in the physiotherapy trial, we wanted to investigate
– if there was any relationship between the participants’ fitness level and their age at assessment
– we suspected that age at assessment affected their fitness level regardless of the treatment group they
were in
– quantify the relationship by the direction, strength, and magnitude
24
CONTINUOUS & ORDINAL DATA
Case Study 13 Slide - 25
• We now found, in Case Study 12, that age at assignment had some linear relationship with
participants’ fitness level
– needed to quantify this relationship, i.e. what is the average fitness level at different age at
assignment
– also wanted to predict fitness level for future patients, given their age at assignment
• Objective: set up a statistical model to quantify the effect of exposure variable on the outcome
variable
• Outcome & type:
• Exposure & type:
• Choice of test:
–
• Note –
25
BINARY DATA
Case Study 14 Slide - 26
• Similar analysis was performed as in Case Study 13, but
– substituted the binary fitness level (Unfit / Fit) instead of the continuous fitness level
– and wanted to predict the status of fitness level (Unfit / Fit) for future patients, given their
age at assignment
• Objective: set up a statistical model to quantify the effect of exposure variable on the outcome
variable
• Outcome & type:
• Exposure & type:
• Choice of test:
–
• Note –
26
BINARY DATA
Case Study 15 Slide - 27
• Using the logistic regression model from Case Study 14, we can
– aim to evaluate the predictive performance of the regression model developed given we know the true
outcome status of fitness level for each participant
– investigate the optimal predictive performance of the model
– relate the results to an individual participant indicating the likelihood of them having a specific status of
fitness
27
SURVIVAL DATA
Case Study 16 Slide - 28
• Recall the clinical trial investigating the survival time of patients with a particular cancer (Case Study 11)
– age at randomisation is now considered as an important factor in this relationship regardless of the
treatment group
– still interested in the length of time between first diagnosis and death
– note that censored data still present due to some people having dropped out during follow-up, or are still
alive at the end of study and we want to make use of this information
• Objective: set up a statistical model to quantify the relationship between the exposure variable and the survival
status / time
• Outcome & type:
• Exposure & type:
• Choice of test:
–
• Note –
28
References
Slide - 29
• Altman, D.G. Practical Statistics for Medical Research. Chapman and Hall 1991.
• Kirkwood B.R. & Sterne J.A.C. Essential Medical Statistics. 2nd Edition. Oxford: Blackwell Science Ltd 2003.
• Bland M. An Introduction to Medical Statistics. 3rd Edition. Oxford: Oxford Medical Publications 2000.
• Altman D.G., Machin D., Bryant, T.N. & Gardner M.J. Statistics with Confidence. 2nd Edition. BMJ Books
2000.
• Campbell M.J. & Machin D. Medical Statistics: A Commonsense Approach. 3rd Edition, 1999.
• Field A. Discovering Statistics Using SPSS for Windows. 2nd edition. London: Sage Publications 2005.
• Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical
measurement. Lancet, i, 307-310.
• Mathews JNS, Altman DG, Campbell MJ, Royston P (1990) Analysis of serial measurements in medical
research. British Medical Journal, 300, 230-235.
29
Other web and software resources
Slide - 30
• UCLA – What statistical analysis should I use?
– http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm
• DISCUS
– Discovering Important Statistical Concepts Using Spreadsheets
– Interactive spreadsheets, designed for teaching statistics
– Web-sites for download and information -
http://www.coventry.ac.uk/ec/research/discus/discus_home.html
• Choosing the correct statistical test
– http://bama.ua.edu/~jleeper/627/choosestat.html
• SPSS for Windows
– Help
– Statistics Coach
• Statistics for the Terrified
30
Solutions to
Case Studies
CONTINUOUS & ORDINAL DATA
Case Study 1 Slide - 32
• A simple study investigating:
– the fitness level of our locally selected group of healthy volunteers
– with the published average value on fitness level which was done previously on the national
level
– fitness level was measured by the length of time walking on a treadmill before stopping
through tiredness
• Objective: any difference between the group average and the published value
• Outcome & type: fitness level (length of time) – continuous
• Exposure & type: one group only
• If the continuous outcome is
vs.
– Normally distributed One-sample t test
– Not Normally distributed Sign test / Signed rank test
32
CONTINUOUS & ORDINAL DATA
Case Study 2 Slide - 33
• A clinical trial investigating:
– the effect of two physiotherapy treatments (standard and enhanced exercise) for patients with
a broken leg
– on their fitness level (length of time walking on a treadmill before stopping through
tiredness)
33
CONTINUOUS & ORDINAL DATA
Case Study 3 Slide - 34
• Now each patient performs the walking test before and after enhanced
physiotherapy treatment
– data might be presented as two variables, one as before data and the other as after
data, but the values for individual patients are paired
• Objective: any difference between the before and the after averages
• Number of outcomes: 2 (before and after)
• Outcomes & type: fitness level – continuous,
paired (or related)
• If the difference in outcomes (e.g. after - before) is
– Normally distributed Paired t test
– Not Normally distributed Wilcoxon signed rank test
34
CONTINUOUS & ORDINAL DATA
Case Study 4 Slide - 35
• Based on Case Study 2 (standard vs. enhanced exercises), but now with a control
group
– i.e. patients without a broken leg
38
BINARY DATA
Case Study 8 Slide - 39
• Similar setting as Case Study 2, but with the binary outcome defined from Case Study 7
(Unfit / Fit)
– to find out if the enhanced treatment is better than the standard treatment, i.e. more patients
into the Fit category
39
BINARY DATA
Case Study 9 Slide - 40
• Fitness still assessed as Unfit / Fit, but we now have only one group of patients
assessed before and after enhanced physiotherapy
– each patient was measured before and after treatment
– their status in fitness may change
– similar to Case Study 3
After
• Objective: any change in status Before Unfit Fit
• Number of outcomes: 2 (before and after)
• Outcomes & type: fitness level category Unfit
– binary, paired (or related)
• Choice of test:
Fit
– McNemar’s test
40
BINARY DATA
Case Study 10 Slide - 41
• Recall resting blood pressure (BP) was recorded by two different machines (machine 1 and 2)
on our participants from Case Study 6
– the measurements were now categorised as Low BP and High BP
– could be as a result of dichotomising the previous continuous outcome by the default settings
from the two machines
– aim to validate the status recorded from machine 2 by assessing the level of agreement with
that obtained from machine 1
Mac. 2
• Objective: any agreement between measuring
tools Mac. 1 Low High
• Number of outcomes: 2 (machines)
Low
• Outcomes & type: blood pressure status (from
each machine) – binary, paired (or related)
• Choice of test:
High
– Kappa statistic
41
SURVIVAL DATA
Case Study 11 Slide - 42
• A clinical trial investigating the survival time of patients with a particular cancer
– patients are being randomised into a number of treatment groups
– they are then monitored until the end of the study
– the length of time between first diagnosis and death is recorded
– some people will still be alive at the end of study and we don’t want to exclude them
43
CONTINUOUS & ORDINAL DATA
Case Study 13 Slide - 44
• We now found, in Case Study 12, that age at assignment had some linear relationship with
participants’ fitness level
– needed to quantify this relationship, i.e. what is the average fitness level at different age at
assignment
– also wanted to predict fitness level for future patients, given their age at assignment
• Objective: set up a statistical model to quantify the effect of exposure variable on the outcome
variable
• Outcome & type: fitness level – continuous
• Exposure & type: age at assessment – continuous
• Choice of test:
– (Simple) Linear regression
• Note – Linear regression is also appropriate when the exposure variable is categorical, e.g.
exercise treatment group (standard & enhanced), as well as controlling for other covariates
44
BINARY DATA
Case Study 14 Slide - 45
• Similar analysis was performed as in Case Study 13, but
– substituted the binary fitness level (Unfit / Fit) instead of the continuous fitness level
– and wanted to predict the status of fitness level (Unfit / Fit) for future patients, given their
age at assignment
• Objective: set up a statistical model to quantify the effect of exposure variable on the outcome
variable
• Outcome & type: fitness level category – binary
• Exposure & type: age at assessment – continuous
• Choice of test:
– (Simple) Logistic regression
• Note – Logistic regression is also appropriate when the exposure variable is categorical, e.g.
exercise treatment group (standard & enhanced), as well as controlling for other covariates
45
BINARY DATA
Case Study 15 Slide - 46
• Using the logistic regression model from Case Study 14, we can
– aim to evaluate the predictive performance of the regression model developed given we know the true
outcome status of fitness level for each participant
– investigate the optimal predictive performance of the model
– relate the results to an individual participant indicating the likelihood of them having a specific status of
fitness
46
SURVIVAL DATA
Case Study 16 Slide - 47
• Recall the clinical trial investigating the survival time of patients with a particular cancer (Case Study 11)
– age at randomisation is now considered as an important factor in this relationship regardless of the
treatment group
– still interested in the length of time between first diagnosis and death
– note that censored data still present due to some people having dropped out during follow-up, or are still
alive at the end of study and we want to make use of this information
• Objective: set up a statistical model to quantify the relationship between the exposure variable and the survival
status / time
• Outcome & type: time monitored & death status – survival
• Exposure & type: age at randomisation – continuous
• Choice of test:
– Cox regression
• Note – Cox regression is also appropriate when the exposure variable is categorical, e.g. treatment groups
(active & placebo), as well as controlling for other covariates
47