Professional Documents
Culture Documents
Data Presentation For Doctors
Data Presentation For Doctors
Biostatistics 101:
Data Presentation
Y H Chan
plot along a curve instead of a line. Take note that the Descriptive statistics using skewness and kurtosis
interest here is the central portion of the line, severe Fig. 3 shows the three types of skewness (right:
deviations means non-normality. Deviations at the skew >0, normal: skew ~0 and left: skew <0).
“ends” of the curve signifies the existence of outliers. Skewness ranges from -3 to 3. Acceptable range
Fig. 3 shows the histograms and their corresponding for normality is skewness lying between -1 to 1.
Q-Q plots of the three datasets. Normality should not be based on skewness
Fig. 3
Skew = 1.47
Kurtosis = 2.77
Right skew
Skew = -0.71
Kurtosis = -0.47
Left skew
Skew = 0.31
Kurtosis = -0.32
Normal
Singapore Med J 2003 Vol 44(6) : 283
alone; the kurtosis measures the “peakness” of Table IV. Flowchart for normality checking.
the bell-curve (see Fig. 4). Likewise, acceptable range
1. Small samples* (n<30): always assume not normal.
for normality is kurtosis lying between -1 to 1.
The corresponding skewness and kurtosis values 2. Moderate samples (30-100).
for the three illustrative datasets are shown in Fig. 3. If formal test is significant, accept non-normality otherwise
double-check using graphs, skewness and kurtosis to confirm
normality.
Fig. 4
3. Large samples (n>100).
0.2 kurtosis ~0
Measures of Spread
The measures of central tendency give us an
0.1 indication of the typical score in a sample. Another
kurtosis <0
important descriptive statistics to be presented for
0.0
quantitative data is its variability – the spread of the
data scores.
-6 -4 -2 0 2 4 6
The simplest measure of variation is the range
which is given by the difference between the
maximum and minimum scores of the data. However,
this does not tell us what’s happening in between
Formal statistical tests – Komolgorov Smirnov one
these scores.
Sample test and Shapiro Wilk test
A popular and useful measure of spread is the
standard deviation (sd) which tells us how much all
Here the null hypothesis is: Data is normal
the scores in a dataset cluster around the mean. Thus
we would expect the sd of the age distribution of a
primary one class of pupils to be zero (or at least a
Table III. Normality tests.
small number). A large sd is indicative of a more
Kolmogorov-Smirnov Shapiro-Wilk varied data scores. Fig. 5 shows the spread of two
Statistic df Sig. Statistic df Sig. distributions with the same mean.
0.3
From the p-values (sig), see Table III, both sd = 1.0
Right skew and Left skew are not normal (as
expected!). To test for normality, in SPSS, use 0.2
Fig. 6 Distribution of data for a normal curve. For example, the mean difference in BP reduction
between an active treatment and control is 7.5 (95%
0.4
CI 1.5 to 13.5) mmHg. It looks like the active is
68% “fantastic” with a 7.5 mmHg reduction but from the
0.3
large confidence interval of 12 (= 13.5 - 1.5), it could
possibly be that the study was conducted with a
95%
0.2 small sample size or the variation of the difference
was large. Thus from the CI, we are able to assess
0.1 99% the quality of the results.
When should the usual 95% CI be presented.
Surely for treatment differences, it should be specified.
0.0
How about variables like age? There’s no need for
-4 -3 -2 -1 0 1 2 3 4 age in demographics but if we are presenting the age
of risk of having a disease, for example, then a 95%
CI would make sense.
Here comes the million dollar question? Does a The error bar plot is a convenient way to show
small sd imply good research data? I believe most of the CI, see Fig. 7.
you (at least 90%) would say yes! Well, partly you are
right – it depends. Fig. 7 Error bar plot.
For the age distribution of the subjects enrolled
30
in your research study, you would not want the sd to
be small as this will imply that your results obtained
28
could not be generalised to a larger age-range group.
On the other hand, you would hope that the sd of
26
the difference in outcome response between two
Age
We shall discuss the statistical analysis of 5. Pagano M and Gauvreau K. Principles of biostatistics, Duxbury
Press. Wadsworth Publishing Company, 1993.
quantitative data in our next issue (Biostatistics 6. Lloyd D Fisher, Gerald Van Belle, Biostatistics. A methodology for
102: Quantitative Data – Parametric and Non- the health sciences. John Wiley & Sons, 1993.
Parametric tests). 7. Campbell MJ, Machin D. Medical statistics —A commonsense
approach. John Wiley & Sons, 1999.
8. Bland M. An introduction to medical statistics. Oxford University
REFERENCES Press, 1995.
1. Chan YH. Randomised controlled trials (RCTs) — Essentials, 9. Armitage P and Berry G. Statistical methods in medical research.
Singapore Medical Journal 2003; Vol 44(2):60-3. 3rd edition, Blackwell Science, 1994.
2. Beth Dawson-Saunders, Trapp RG. Basic and clinical biostatistics. 10. Altman DG. Practical statistics for medical research. Chapman
Prentice Hall International Inc, 1990. and Hall, 1991.
3. Bowers D, John Wiley and Sons. Statistics from scratch for Health 11. Larry Gonick and Woollcott Smith. The cartoon guide to statistics.
Care professionals, 1997. HarperCollins Publishers, Inc. 1993.
4. Bowers D, John Wiley and Sons. Statistics further from scratch for 12. Chan YH. Randomised controlled trials (RCTs) — Sample size: the
Health Care professionals, 1997. magic number? Singapore Medical Journal 2003; Vol 44 (4):172-4.