Professional Documents
Culture Documents
SR 8 Test Association PDF
SR 8 Test Association PDF
Review
Statistics review 8: Qualitative data – tests of association
Viv Bewick1, Liz Cheek1 and Jonathan Ball2
1Senior Lecturer, School of Computing, Mathematical and Information Sciences, University of Brighton, Brighton, UK
2Lecturer in Intensive Care Medicine, St George’s Hospital Medical School, London, UK
Published online: 30 December 2003 Critical Care 2004, 8:46-53 (DOI 10.1186/cc2428)
This article is online at http://ccforum.com/content/8/1/46
© 2004 BioMed Central Ltd (Print ISSN 1364-8535; Online ISSN 1466-609X)
Abstract
This review introduces methods for investigating relationships between two qualitative (categorical)
variables. The χ2 test of association is described, together with the modifications needed for small
samples. The test for trend, in which at least one of the variables is ordinal, is also outlined. Risk
measurement is discussed. The calculation of confidence intervals for proportions and differences
between proportions are described. Situations in which samples are matched are considered.
Keywords χ2 test of association, Fisher’s exact test, McNemar’s test, odds ratio, risk ratio, Yates’ correction
Table 1 Table 2
Numbers of patients classified by site of central venous Numbers of patients expected in each classification if there
cannula and infectious complication were no association between site of central venous cannula
and infectious complication
Infectious complication
Infectious complication
Bacteraemia/
Central line site None Exit site septicaemia Total Bacteraemia/
Central line site None Exit site septicaemia Total
Internal jugular 686 152 96 934
Internal jugular 714.5 134.1 85.4 934
Subclavian 451 35 38 524
Subclavian 400.8 75.3 47.9 524
Femoral 168 58 22 248
Femoral 189.7 35.6 22.7 248
Total 1305 245 156 1706
Total 1305 245 156 1706
Table 3 Table 4
6 10.64 12.59 16.81 22.46 Data on patients with acute myocardial infarction who took
7 12.02 14.07 18.48 24.32 part in a trial of intravenous nitrate
Table 6
Died 3 8 2 9 1 10 0 11
Survived 47 37 48 36 49 35 50 34
For large samples the three tests – χ2, Fisher’s and Yates’ – Following the lines of the test of the gradient in regression,
give very similar results, but for smaller samples Fisher’s test with some fairly minor modifications and using large sample 49
Critical Care February 2004 Vol 8 No 1 Bewick et al.
Table 8 Table 9
Number of patients according to AVPU and survival Outcomes of the study conducted by Rivers and coworkers
∑ (x
i =1
i − x) 2 (y i − y) 2 exposed to the risk factor who develop the disease compared
with those exposed to the risk factor who do not develop the
disease. This is best illustrated by a simple example. If a bag
For the data in Table 8, we obtain a test statistic of 19.33 contains 8 red balls and 2 green balls, then the probability
with 1 degree of freedom and a P value of less than 0.001. (risk) of drawing a red ball is 8/10 whereas the odds of
Therefore, the trend is highly significant. The difference drawing a red ball is 8/2. As can be seen, the measurement
between the χ2 test statistic for trend and the χ2 test statistic of odds, unlike risk, is not confined to the range 0–1. In the
in the original test is 19.38 – 19.33 = 0.05 with 2 – 1 = 1 study conducted by Rivers and coworkers [6] the odds of
degree of freedom, which provides a test of the departure death with early goal-directed therapy is 38/79 = 0.48, and
from the trend. This departure is very insignificant and sug- on the standard therapy it is 59/60 = 0.98.
gests that the association between survival and AVPU classi-
fication can be explained almost entirely by the trend. Confidence interval for a proportion
As the measurement of risk is simply a proportion, the confi-
Some computer packages give the trend test, or a variation. dence interval for the population measurement of risk can be
The trend test described above is sometimes called the calculated as for any proportion. If the number of individuals
Cochran–Armitage test, and a common variation is the in a random sample of size n who experience a particular
Mantel–Haentzel trend test. outcome is r, then r/n is the sample proportion, p. For large
samples the distribution of p can be considered to be approx-
Measurement of risk imately Normal, with a standard error of [2]:
Another application of a two by two contingency table is to
p(1 − p) / n
examine the association between a disease and a possible
risk factor. The risk for developing the disease if exposed to The 95% confidence interval for the true population propor-
the risk factor can be calculated from the table. A basic mea- tion, p, is given by p – 1.96 × standard error to p + 1.96 ×
surement of risk is the probability of an individual developing a standard error, which is:
disease if they have been exposed to a risk factor (i.e. the rela-
p ± 1.96 p(1 − p) / n
tive frequency or proportion of those exposed to the risk factor
that develop the disease). For example, in the study into early where p is the sample proportion and n is the sample size.
goal-directed therapy in the treatment of severe sepsis and The sample proportion is the risk and the sample size is the
septic shock conducted by Rivers and coworkers [6], one of total number exposed to the risk factor.
the outcomes measured was in-hospital mortality. Of the 263
patients who were randomly allocated either to early goal- For the study conducted by Rivers and coworkers [6] the
directed therapy or to standard therapy, 236 completed the 95% confidence interval for the risk for death on early goal-
therapy period with the outcomes shown in Table 9. directed therapy is 0.325 ± 1.96(0.325[1 – 0.325]/117)0.5 or
(24.0%, 41.0%), and on the standard therapy it is (40.6%,
From the table it can be seen that the proportion of patients 58.6%). The interpretation of a confidence interval is
receiving early goal-directed therapy who died is described in Statistics review 2 [3] and indicates that, for
38/117 = 32.5%, and so this is the risk for death with early those on early goal-directed therapy, the true population risk
goal-directed therapy. The risk for death on the standard for death is likely to be between 24.0% and 41.0%, and that
50 therapy is 59/119 = 49.6%. for the standard therapy between 40.6% and 58.6%.
Available online http://ccforum.com/content/8/1/46
Comparing risks therapy who died is p1 = 38/117 = 0.325 and the proportion
To assess the importance of the risk factor, it is necessary to of those on standard therapy who died is
compare the risk for developing a disease in the exposed p2 = 59/119 = 0.496. A confidence interval for the difference
group with the risk in the nonexposed group. In the study by between the true population proportions is given by:
Rivers and coworkers [6] the risk for death on the early goal-
directed therapy is 32.5%, whereas on the standard therapy it (p1 – p2) – 1.96 × se(p1 – p2) to (p1 – p2) + 1.96 × se(p1 – p2)
is 49.6%. A comparison between the two risks can be made
by examining either their ratio or the difference between them. Where se(p1 – p2) is the standard error of p1 – p2 and is cal-
culated as:
Risk ratio
p1 (1 − p1 ) p 2 (1 − p 2 ) 0.325 × 0.675 0.496 × 0.504
The risk ratio measures the increased risk for developing a + = + = 0.063
n1 n2 117 119
disease when having been exposed to a risk factor compared
with not having been exposed to the risk factor. It is given by
RR = risk for the exposed/risk for the unexposed, and it is Thus, the required confidence interval is –0.171 – 1.96 ×
often referred to as the relative risk. The interpretation of a rel- 0.063 to –0.171 + 1.96 × 0.063; that is –0.295 to –0.047.
ative risk is described in Statistics review 6 [7]. For the Rivers Therefore, the difference between the true proportions is
study the relative risk = 0.325/0.496 = 0.66, which indicates likely to be between –0.295 and –0.047, and the risk for
that a patient on the early goal-directed therapy is 34% less those on early goal-directed therapy is less than the risk for
likely to die than a patient on the standard therapy. those on standard therapy.
The calculation of the 95% confidence interval for the relative Hypothesis test
risk [8] will be covered in a future review, but it can usefully We can also carry out a hypothesis test of the null hypothesis
be interpreted here. For the Rivers study the 95% confidence that the difference between the proportions is 0. This follows
interval for the population relative risk is 0.48 to 0.90. similar lines to the calculation of the confidence interval, but
Because the interval does not contain 1.0 and the upper end under the null hypothesis the standard error of the difference
is below, it indicates that patients on the early goal-directed in proportions is given by:
therapy have a significantly decreased risk for dying as com-
pared with those on the standard therapy. p1 (1 − p1 ) p 2 (1 − p 2 ) 1 1
+ = p(1 − p) +
n1 n2 n
1 n 2
Odds ratio
When quantifying the risk for developing a disease, the ratio where p is a pooled estimate of the proportion obtained from
of the odds can also be used as a measurement of compari- both samples [5]:
son between those exposed and not exposed to a risk factor. total deaths in both samples 38 + 59 97
It is given by OR = odds for the exposed/odds for the unex- = = = 0.3856
posed, and is referred to as the odds ratio. The interpretation
total of sample sizes 117 + 119 236
of odds ratio is described in Statistics review 3 [4]. For the So:
Rivers study the odds ratio = 0.48/0.98 = 0.49, again indicat- 1 1
ing that those on the early goal-directed therapy have a se(p1 – p2) = 0.3856 × 0.6144 × + = 0.0634
117 119
reduced risk for dying as compared with those on the stan-
dard therapy. This will be covered fully in a future review. The test statistic is then:
p1 −p 2 - 0.1710
The calculation of the 95% confidence interval for the odds = = –2.71
se(p1 −p 2 ) 0.0634
ratio [2] will also be covered in a future review but, as with
relative risk, it can usefully be interpreted here. For the Rivers Comparing this value with a standard Normal distribution
example the 95% confidence interval for the odds ratio is gives p = 0.007, again suggesting that there is a difference
0.29 to 0.83. This can be interpreted in the same way as the between the two population proportions. In fact, the test
95% confidence interval for the relative risk, indicating that described is equivalent to the χ2 test of association on the
those receiving early goal-directed therapy have a reduced two by two table. The χ2 test gives a test statistic of 7.31,
risk for dying. which is equal to (–2.71)2 and has the same P value of
0.007. Again, this suggests that there is a difference between
Difference between two proportions the risks for those receiving early goal-directed therapy and
Confidence interval those receiving standard therapy.
For the Rivers study, instead of examining the ratio of the
risks (the relative risk) we can obtain a confidence interval Matched samples
and carry out a significance test of the difference between Matched pair designs, as discussed in Statistics review 5 [9],
the risks. The proportion of those on early goal-directed can also be used when the outcome is categorical. For 51
Critical Care February 2004 Vol 8 No 1 Bewick et al.
In this situation, because the χ2 test does not take pairing into Breath test
consideration, a more appropriate test, attributed to
McNemar, can be used when comparing these correlated Oxoid test + – Total
proportions. + 40 8 (b) 48
– 4 (c) 32 36
For example, in the comparison of two diagnostic tests used
in the determination of Helicobacter pylori, the breath test Total 44 40 84(n)
and the Oxoid test, both tests were carried out in 84 patients
and the presence or absence of H. pylori was recorded for
each patient. The results are shown in Table 10, which indi-
cates that there were 72 concordant pairs (in which the tests
agree) and 12 discordant pairs (in which the tests disagree). For the example where b = 8, c = 4 and n = 84, the difference
The null hypothesis for this test is that there is no difference is calculated as 0.048 and the standard error as 0.041. The
in the proportions showing positive by each test. If this were approximate 95% confidence interval is therefore
true then the frequencies for the two categories of discordant 0.048 ± 1.96 × 0.041 giving –0.033 to 0.129. As this spans
pairs should be equal [5]. The test involves calculating the dif- 0, it again indicates that there is no difference in the propor-
ference between the number of discordant pairs in each cate- tion of positive determinations of H. pylori using the breath
gory and scaling this difference by the total number of and the Oxoid tests.
discordant pairs. The test statistic is given by:
Limitations
2 ( b − c) 2
χ = For a χ2 test of association, a recommendation on sample
b+c size that is commonly used and attributed to Cochran [5] is
Where b and c are the frequencies in the two categories of that no cell in the table should have an expected frequency of
discordant pairs (as shown in Table 10). The calculated test less than one, and no more than 20% of the cells should have
statistic is compared with a χ2 distribution with 1 degree of an expected frequency of less than five. If the expected fre-
freedom to obtain a P value. For the example b = 8 and c = 4, quencies are too small then it may be possible to combine
therefore the test statistic is calculated as 1.33. Comparing categories where it makes sense to do so.
this with a χ2 distribution gives a P value greater than 0.10,
indicating no significant difference in the proportion of posi- For two by two tables, Yates’ correction or Fisher’s exact test
tive determinations of H. pylori using the breath and the can be used when the samples are small. Fisher’s exact test
Oxoid tests. can also be used for larger tables but the computation can
become impossibly lengthy.
The test can also be carried out with a continuity correction
attributed to Yates [5], in a similar way to that described In the trend test the individual cell sizes are not important but
above for the χ2 test of association. The test statistic is then the overall sample size should be at least 30.
given by:
(b − c − 1)2 The analyses of proportions and risks described above
χ2 = assume large samples with similar requirement to the χ2 test
b+c of association [8].
and again is compared with a χ2 distribution with 1 degree of
freedom. For the example, the calculated test statistic includ- The sample size requirement often specified for McNemar’s
ing the continuity correct is 0.75, giving a P value greater test and confidence interval is that the number of discordant
than 0.25. pairs should be at least 10 [8].
References
1. Bewick V, Cheek L, Ball J: Statistics review 7: Correlation and
regression. Crit Care 2003, 7:451-459.
2. Everitt BS: The Analysis of Contingency Tables, 2nd ed. London,
UK: Chapman & Hall; 1992.
3. Whitley E, Ball J: Statistics review 2: samples and populations.
Crit Care 2002, 6:143-148.
4. Whitley E, Ball J: Statistics review 3: hypothesis testing and P
values. Crit Care 2002, 6:222-225.
5. Bland M: An Introduction to Medical Statistics, 3rd ed. Oxford,
UK: Oxford University Press; 2001.
6. Rivers E, Nguyen B, Havstad S, Ressler J, Muzzin A, Knoblich B,
Peterson E, Tomlanovich M; Early Goal-Directed Therapy Collabo-
rative Group: Early goal-directed therapy in the treatment of
severe sepsis and septic shock. N Engl J Med 2001,
345:1368-1377.
7. Whitley E, Ball J: Statistics review 6: Nonparametric methods.
Crit Care 2002, 6:509-513.
8. Kirkwood BR, Sterne JAC: Essential Medical Statistics, 2nd ed.
Oxford, UK: Blackwell Science Ltd; 2003.
9. Whitley E, Ball J: Statistics review 5: Comparison of means.
Crit Care 2002, 6:424-428.
53