Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Some statistical methods useful in circulation research.

S Wallenstein, C L Zucker and J L Fleiss

Circ Res. 1980;47:1-9


doi: 10.1161/01.RES.47.1.1
Circulation Research is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231
Copyright © 1980 American Heart Association, Inc. All rights reserved.
Print ISSN: 0009-7330. Online ISSN: 1524-4571

The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://circres.ahajournals.org/content/47/1/1

Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in
Circulation Research can be obtained via RightsLink, a service of the Copyright Clearance Center, not the
Editorial Office. Once the online version of the published article for which permission is being requested is
located, click Request Permissions in the middle column of the Web page under Services. Further information
about this process is available in the Permissions and Rights Question and Answer document.

Reprints: Information about reprints can be found online at:


http://www.lww.com/reprints

Subscriptions: Information about subscribing to Circulation Research is online at:


http://circres.ahajournals.org//subscriptions/

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


JULY 1980
Circulation Research VOL. 47 NO. 1
An Official Journal of the American Heart Association

SPECIAL ARTICLE*
Some Statistical Methods Useful in
Circulation Research
SYLVAN WALLENSTEIN, CHRISTINE L. ZUCKER, AND JOSEPH L. FLEISS

SUMMARY Some statistical techniques for analyzing the kinds of studies typically reported in
Circulation Research are described. Particular emphasis is given to the comparison of means from more
than two populations, the joint effect of several experimentally controlled variables, and the analysis
of studies with repeated measurements on the same experimental units. Circ Res 47: 1-9, 1980

GLANTZ (1980), in an examination of papers pub- variance. The statistical methods employed by most
lished in Circulation Research and Circulation, authors of the articles published in this Journal
found that approximately half the studies that used appear to have remained fairly static over the past
statistical methodology did so incorrectly. He de- 10 years, with an increase in the proportion of
scribes the causes and consequences of such misuse authors using statistical methods of any kind but
and presents methods to minimize the frequency of with only a slight increase in the proportion of
such errors, including an approach adopted by the authors using newer (post-1940) techniques. This
Editors of Circulation Research (Rosen and Hoff- finding probably is not too different from what
man, 1978). Glantz's assertion of incorrect statistical would characterize comparable biomedical Jour-
methodology in Circulation Research was docu- nals.
mented in unpublished correspondence to the Edi- The purpose of this paper is to describe, in a
tors. didactic manner, statistical techniques that are
At the request of the Editors of this Journal, we more appropriate than simple £-tests for analyzing
attempted to verify these findings by reviewing data from the kinds of studies typically reported in
articles published in volume 40 from 1977, as well this Journal (as determined by the review cited
as articles in volumes published 5 and 10 years above). All the techniques to be described can be
before. In general, our review supports Glantz's performed with simple pocket or desk-top calcula-
contention. tors. References to packaged computer programs
By far, the most persistent defect found by Glantz are included for the benefit of investigators who
and confirmed by our review was that one pair of have access to computers.
techniques, the independent sample and paired The paper will focus on the three kinds of studies
sample £-tests, was used to the virtual exclusion of most frequently reported in this Journal. In the
the more appropriate techniques of the analysis of first type of study, several different groups are
compared. For the comparison of two groups, use
of the simple £-test usually is correct, but for the
From the Division of Biostatistics, School of Public Health, College of
Physicians and Surgeons, Columbia University, New York, New York. comparison of three or more groups, more appro-
Address for reprints: Sylvan Wallenstein, Ph.D., Division of Biostatis- priate procedures will be suggested. In the second
tics, School of Public Health, Columbia University, 600 West 168 Street, type of study, the so-called factorial study, the
New York, New York 10032.
' Editors' Note: Two years ago we summarized results Stanton A. possible joint effect of several experimentally con-
Glantz had obtained by reviewing the use of statistical methods in papers trolled variables (e.g., treatment, strain, sex) on
published in the Journal. He found that statistical methods frequently outcome is examined. In the third type of study, the
were used incorrectly. We asked a group of statisticians to reevaluate the
use of statistical methods in several volumes of the Journal and to so-called repeat measurements study, the effect of
summarize their findings. This Special Article by Wallenstein et al. treatment over time, is of interest. Techniques of
presents their conclusions and also provides, in summary form, advice on analysis appropriate to each kind of study will be
which tests are most appropriate in terms of the design of the experiment
and the questions posed. considered and illustrated.
Since our editorial appeared, there has been marked improvement in The use of these techniques often will result in
the selection and use of statistical methods to test the significance of
results in papers published in the Journal. We thank both the authors more sensitive analyses of experimental effects; that
and referees for their concern and effort. is, it may be possible to detect as significant a given

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


CIRCULATION RESEARCH VOL. 47, No. 1, JULY 1980

difference using appropriate statistical procedures, TABLE 1 Experimentally Induced Congestive Heart
but one may fail to do so with currently popular Failure or Right Ventricular Hypertrophy in Cats
procedures. The use of appropriate analyses will Heart rate*
provide greater depth to data exploration, since Group i\ (beats/min) n, (X..-X.)2 (ni-l)s,2
many hypotheses of interest can be tested within Control 5 239 + 29.07 80 3,380
the framework of a single overall analysis. For ex- CHF 5 182 ± 44.72 14,045 8,000
ample, when two or more treatments are given at CHFR 5 231 ± 31.30 80 3,920
several times, it is possible to test for overall treat- RVH 6 272 ± 19.60 8,214 1,920
ment differences, and time differences, as well as RVHR 4 248 ± 36.00 676 3,888
the consistency of treatment differences over time. Abbreviations: n, = number of observations for group i; X,. = mean for
Repeatedly performing significance tests in the group i; X.. = overall mean = 135; Si = standard deviation for group i. The
original results reported by Coulson (1977) were mean ± SE, where SE =
same study inevitably increases the chance of mis-
takenly declaring significance. The methods pre- * Results expressed as mean ± Sj.
sented here also safeguard against this source of
error. value that for the ^-statistic with degrees of freedom
rii + rij — 2, where nt and rij are the sample sizes for
Overall Comparison of Several groups i and j . Thus, to compare groups 1 and 2 at
Independent Groups the P = 0.05 level, the critical value would be to.os
An example of an experiment comparing several (df = 8) = 2.306. The ^-statistics for all 10 pairwise
independent groups is a study reported by Coulson comparisons are given in column 1 of Table 2.
et al. (1977) in which five experimental groups were This procedure has two defects. First, full use is
compared. These included a control group, a group not made of all information concerning variability
with induced congestive heart failure (CHF), a within the groups. Second, if no overall differences
group with induced right ventricular hypertrophy exist between any of the groups, the investigator
(RVH), a group with induced congestive heart fail- will have about a 30% chance of declaring at least
ure followed by 30 days of recovery (CHFR), and a one difference significant, instead of the 5% chance
group with induced right ventricular hypertrophy which use of the critical value 2.306 implies.
followed by 30 days of recovery (RVHR). Table 1 The lack of use of all information on variability
presents the group sample sizes and the means and can result in certain anomalies. For example, the
standard deviations for heart rate. observed mean difference between RVH and con-
Coulson et al. analyzed the data by performing t- trol was 33 beats/min, and the difference between
tests for each pair of treatments, using as a critical CHF and CHFR was 49. Nevertheless, the ^-ratio

TABLE 2 Results of Pairwise Comparison


Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7
(-Statistic Results of simultaneous significance tests
Bonferroni
Group No. 1 Group No. 2 Conventional Modified m=4 m = 10 Tukey Dunnett Scheffe
Control CHF 2.39 2.77
Control CHFR 0.42 0.39
Control RVH 2.25 1.68
Control RVHR 0.42 0.41
CHF CHFR 2.01 2.38 NA NA
CHF RVH 4.48 4.57 NA NA
CHF RVHR 2.39 3.49 NA NA
CHFR RVH 2.66 2.08 NA NA
CHFR RVHR 0.84 0.78 NA NA
RVH RVHR 1.38 1.14 NA NA
NA = test is not applicable.
* = statistically significant at P = 0.05 level; — = not statistically significant at P = 0.05 level.

Critical values for above tests (df = 20)


No. of comparisons
Procedure tested Critical 0.05 value Derived from tables of
NonSimultaneous 1 2.086 ^-Distribution
Bonferroni 4 2.76 ^-Distribution
Bonferroni 10 3.12 ^-Distribution
Tukey 10 3.00 Studentized range
Dunnett 4 2.65 Special table
Scheffe 10 3.39 F-distribution

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


STATISTICAL METHODS IN CIRCULATION RESEARCH/ Wallenstein et al.

was 2.25 for the former comparison and 2.01 for the the table, the F-ratio obtained, 5.47, is greater than
latter—a difference that cannot be explained by the the tabulated P = 0.01 critical value. Therefore, we
small difference in sample sizes. conclude that the group means are significantly
The second defect is the more serious, because different at P < 0.01.
the investigator has nearly a 1 in 3 chance of de- Suppose the analysis of variance indicates the
claring some differences between five groups to be significance of differences among groups. If a com-
significant, even if no differences really exist. parison between groups i and j is desired, it should
To remedy these defects, an overall test based on be performed using the modified ^-statistic t =
the analysis of variance should first be performed (X; — Xj) / s V(l//ii +lAij) , where s2, the mean
to give a single test statistic for differences between square within groups,_is taken from the analysis of
all five groups. If, and only if, this test indicates variance table, and X* and n\ are the mean and
that some differences exist, should modified £-tests sample size for group i. These modified ^-statistics
be performed to identify the sources of these differ- are listed in column 2 of Table 2. Note that the 33
ences. beats/min difference between RVH and control is
associated with a modified ^-statistic of 1.68, and
The Analysis of Variance the 49 beats/min difference between CHF and
The analysis of variance partitions the total var- CHFR is associated with a modified ^-statistic of
iability in the experiment (the total sum of squares) 2.38.
into components (sums of squares) due to between- Appropriate critical values for this statistic are
treatment and within-treatment variability. These discussed in the section on simultaneous multiple
sums of squares are then both divided by the ap- comparison procedures.
propriate degrees of freedom to yield a mean square The overall F-test and the modified ^-statistics,
between groups and a mean square within groups. together with an indication of significance based on
The latter mean square is an average of the vari- the procedures to be described below, can be ob-
ances within each of the groups and is a measure of tained using program ONEWAY of SPSS (Nie et
biological variability. (A variance is simply the al., 1975).
square of a standard deviation.) The ratio of the
mean square between groups to the mean square Assumptions and Alternative Procedures
within groups, the F-statistic or F-ratio, is a mea- The analysis of variance assumes that the mea-
sure of differences among groups. If there are no surements are obtained under independent condi-
real differences among the groups, the value of the tions, that the data are distributed normally, and
F-statistic should be close to 1.0. If the F-value is that each group has the same underlying standard
larger than the appropriate critical value (which deviation. (Of course, the sample standard devia-
depends on the degrees of freedom), we conclude tions will vary because of sampling variation.)
that it is unlikely that the observed differences are The assumption of independence is crucial,
due to chance alone and state that the differences whereas the assumption of normality is less crucial,
are statistically significant. All introductory and especially for large sample sizes.
intermediate level statistics texts contain tables of Statisticians describe the importance of the as-
critical values for the F-statistic. sumptions in terms of robustness. For a robust
Table 3 presents an analysis of variance procedure, small or moderate departures from the
(ANOVA) table and formulas that can be used after assumptions (such as a ratio of 3:2 in the standard
the group means and standard deviations have been deviations) will have a small effect on the validity
calculated. The ANOVA table for the data in Table of the procedure (e.g., the actual P value may be
1 is given at the bottom of Table 3. As indicated on 0.045 instead of 0.05). When the sample sizes are

TABLE 3 One- Way Analysis of Variance Table


Degrees of
Source of variation freedom Sum of squares (SS) Mean squares (MS) f-ratio
k
MS between
Between groups k - 1 £ /MXi. - X..)2 SS (between)/(k - 1)
i-i MS within
k
Within groups
N-k £ ("i - l)si2 SS (within)/(N - k)

Analysis of Variance for Data in Table 1


Between groups 4 23095 5773.75 5.47
Within groups 20 21108 1055.40
Critical value FOOi (4, 20) = 4.43. Abbreviations: X,, = measurement on the j l h unit in the ilh group; k = number
of groups; ni = number of units inthe ith group; Xj. = mean of ru measurements in ith group =X?;i Xij/ni; Si2 =
variance of ith group = £jlii (Xjj — X,.)2/(ii — 1); N = number of units in entire experiment =£!'-i n,; X = overall
mean = £1, X"-i Xij/W = I"-. nX./N.

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


CIRCULATION RESEARCH VOL. 47, No. 1, JULY 1980

nearly equal, the analysis of variance is a robust sons to control, an arbitrary preplanned number of
procedure with respect to the assumption of equal comparisons, all pairwise comparisons, or more
variability. However, in a study with markedly dif- complex comparisons. For a given set, the proba-
ferent sample sizes (the largest being more than bility of falsely declaring one or more differences to
twice the smallest), the analysis of variance and the be significant, when, in fact, all means are equal, is
subsequent pairwise comparisons are not robust set at a small value, usually 0.05.
with respect to the assumption of equal variability. It should be noted that large sets (i.e., open-
In many studies, a transformation of the data will ended investigations) lead to larger critical values
result in the assumptions being more nearly satis- than do smaller sets. Thus, an experiment intended
fied, but in other cases, an entirely different method to analyze only the differences between control and
(a nonparametric procedure) based on analyzing each of the four other groups in Table 1 would
the ranks of the observations may be in order. require a smaller critical value (for those compar-
In biological applications, the need for a trans- sions) than a trial intended a priori to investigate
formation frequently is indicated by noticing that each of the 10 possible pairs of differences between
the standard deviation increases as the mean does, treatments. Of the four simultaneous test proce-
or that, for data that are non-negative, the mean dures discussed below, the first is the simplest and
exceeds the standard deviation. (The latter finding is best suited for a small number of preplanned
is indicative of skewness.) For these types of data, comparisons. The second (Tukey's test) is best
the two most useful transformations are the square suited to the case for which all pairwise comparisons
root transformation, valid if the data contain no are of interest, and the third (Dunnett's test) is to
negative quantities, and the logarithmic transfor- be used only in comparing the control group to each
mation, which is suitable if the data contain no of the other groups. The Scheffe test, the last we
zeroes and no negative numbers. The square root shall discuss, is intended to evaluate arbitrary com-
transformation usually is preferred when the mea- binations of groups against each other. The reader
surements are in the nature of frequencies or interested in more detail on these or other methods
counts, and the logarithmic transformation usually should consult Miller (1966).
is preferred when the measurements are of enzyme
activity or other biological characteristics having a The Bonferroni Method
time component for which the standard deviation Based on an elementary inequality called Bon-
is approximately proportional to the mean. For ferroni's inequality, a conservative critical value for
purposes of interpretation, it is useful to transform the modified ^-statistics is obtained from the tables
the mean of the transformed data back to the of the ^-distribution using a significance level of P/ r
original units. m, where m is the number of comparisons between
An alternative class of techniques which can be groups to be performed. (The degrees of freedom
used to analyze the type of data just described, as are, as above, those for the mean square for within
well as a variety of other trials with minimal as- group variation from the ANOVA table.) For ex-
sumptions on the nature of the data, is the so-called ample, if for the data in Table 1 only four compar-
nonparametric procedures. All of these procedures isons were of interest, the P value of 0.05 would be
are based on ranking the N observations from 1 for replaced by 0.05/4 = 0.0125, which, with 20 degrees
the lowest value to N for the greatest value. For the of freedom, yields a critical value of 2.76. Column 3
kind of study considered so far, the mean rank in of Table 2 indicates which of the m = 4 comparisons
each treatment group then is calculated, and a with control would be significant if this procedure
summary statistic, the Kruskal-Wallis test statistic, were used. On the other hand, if all 10 pairwise
then is computed based on the differences between comparisons were of interest, the P level becomes
these observed mean ranks. If the differences 0.05/10 = 0.005, and the critical value becomes 3.12.
among groups are significant, multiple comparisons Note that the determination as to which set of
can be performed as described by Miller (1966) or comparisons is to be tested (four or 10 comparisons)
by Hollander and Wolfe (1973). The technique must be made beforehand, and not by inspection of
makes no assumptions regarding normality, is al- the results. Inspection of column 4 indicates that,
most as sensitive as the t-test when the data are, in for m = 10, only two differences can be declared
fact, normally distributed, and can be much more significant, in comparison to the four differences
powerful otherwise. A defect in the method is that that would be declared significant in column 2 if no
it is not based on descriptive statistics in common control over simultaneous testing were used.
use, and thus description of results is more difficult. [In most cases, the critical Bonferroni value can-
not be obtained from conventional tables of the t-
Simultaneous Multiple Comparisons distribution but may be approximated from widely
Simultaneous multiple comparison procedures available tables of the normal curve by t* = z +
control the error rate associated with an entire set (z + z3) / 4n, where n is the degrees of freedom and
of comparisons rather than the error rate for each z is the critical normal curve value for P/m. For the
comparison. The set could consist of all compari- above example, the critical z-value at P = 0.0125 is

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


STATISTICAL METHODS IN CIRCULATION RESEARCH/Wallenstein etal.

2.50 and, thus, t* = 2.50 + (2.5 + 2.5a) / 80 = 2.73, Recommendation


in close agreement with the exact value of 2.76 The Bonferroni procedure is recommended for
given above.] general use since it is easiest to apply, has the
Tukey's Method widest range of applications, and gives critical val-
ues that will be lower than those of other procedures
For the procedure suggested by Tukey (1949), if the investigator is able to limit the number of
the critical value for the modified ^-statistic is ob- comparisons—and that will be only slightly larger
tained by referring to a value in a table of the than those of other procedures if many comparisons
distribution of the "studentized range." These are made.
tables are available in many intermediate level sta-
tistical texts, such as Snedecor and Cochran (1967). Factorial Designs
The value obtained from the table then is divided
by \J2. For the current example, the value taken The 2 x 2 Design
from the table is 4.24, and division by y/2 yields a In the simplest type of factorial study, usually
critical value of 3.00. The differences between treat- denoted as the 2 x 2 factorial, two treatments are
ments found to be significant by means of the compared in two different populations; i.e., re-
Tukey procedure are given in column 5 of Table 2. sponses are cross-classified by treatment and pop-
In this example, use of either Bonferroni's proce- ulation. The populations could represent sex, strain
dure (with m = 10) or Tukey's procedure gives the of the animal, or even a second pair of experimental
same result with respect to significance of differ- conditions.
ences. Although this test theoretically requires As an example of a 2 X 2 design, consider data
equal sample sizes, it gives reasonable accurate from a study by Cutilletta et al. (1977) in which the
results if, as in this example, the sample sizes are effect of nerve growth factor serum (NGFAS) was
nearly equal. compared with control serum (SHAM) in sponta-
neous hypertensive (SH) and normotensive (WKY)
Dunnett's Method rats. Descriptive statistics (mean ± standard devia-
This procedure is applicable when there is a tion) for kidney renin concentration are shown in
control group and the investigator's only interest is Table 4.
in comparing each of the other groups to the single t-Tests performed to compare the two treatments
control group. The method is that of Dunnett first in SH rats and then in WKY rats lead to the
(1964), who also gives special tables for the critical conclusion of a difference due to NGFAS in SH but
values. Again, as with Tukey's method, this proce- not in WKY rats. However, in general, this proce-
dure is in theory limited to the case of equal sample dure is not the best strategy for at least two reasons:
sizes, or at least equal sample sizes in the treatment First, mere presence of a significant difference in
groups, with a possibly greater number in the con- one strain and its absence in the other strain does
trol group. If, in the current example, each treat- not prove conclusively that the groups (strains)
ment group were compared to the control group, differ in the nature of their response. For example,
the critical value would be 2.65. Column 6 of Table differences between treatments could be the same
2 gives the results of the tests of significance if each for both strains, but differences may be statistically
treatment were to be compared to control using this significant only in the strain with the larger sample
procedure. Note that Dunnett's method gives, in size. Even if the sample sizes are equal, a small
this example, the same results as the Bonferroni difference between the values of the ^-statistic (e.g.,
method for all four comparisons to control. between 1.90 for one strain and 2.10 for the other)
is clearly not indicative of a real difference between
Scheffe's Method treatment effects in the two strains. Second, it is
This procedure, suggested by Scheffe (1959), is possible that both strains show the same nonsta-
intended for complicated comparisons such as the tistically significant effect, but, when the results are
mean of groups 1 and 3 vs. the mean of groups 2, 4, combined properly, the effect is significant.
and 5. The critical P = 0.05 value for this test is The factorial design allows the following ques-
V (k - l)Fo.o5(k - I, N — k), where k is the number tions to be addressed for the above experiment. Is
of groups, N is the number of experimental units,
and Fo.o5(k — 1, N - k) is the critical value for the
overall F-test. In the present case, Fo.os(4, 20) is TABLE 4 Descriptive Statistics (Mean ± Standard
2.87, and the Scheffe critical value is 3.39. The last Deviation) for Kidney Renin Concentration Data
column of Table 2 gives the results of the tests on TRT SH WKY
pairs of treatments if this procedure were used. In
SHAM 2.41 ±0.17 2.95 ± 0.23
this particular example, the results agree with those (n = 10) (n = 8)
of Tukey's test and the Boneferroni procedure, NGFAS 4.24 ±0.31 2.89 ± 0.27
although it should be noted that the critical value (n = 8) U = 6)
is larger. Source: Cutilletta et al. (1977).

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


CIRCULATION RESEARCH VOL. 47, No. 1, JULY 1980

the difference between NGFAS and SHAM the TABLE 6 Data Layout for Factorial Design
same for SH and WKY rats; i.e., is there an inter- Mean values per cell
action between strain and treatment? Averaged
over both strains, is there an effect due to NGFAS? Strain Strain Strain
Treatment I 2 c Unweighted row means
To answer these questions, an analysis of vari-
ance table is constructed that partitions the total 7 X,, X 12 X, c X,. = £ X,j/C
variability into components (sums of squares) due i-i

to treatments, strains, a treatment-strain interac- X22 X2c X 2 . = YJ X 2 J / C


tion, and a within-groups term indicative of animal-
to-animal variation. These sums of squares then are
divided by their appropriate degrees of freedom to
yield mean squares. The significance of each term
then is evaluated, as in the one-way design (Table
Xr, xr. = s xrj/c
3), by dividing each mean square by the mean
square within groups and then comparing this re-
Unweighted column means: X.j = £ Xij/r
sulting F-statistic with the critical value of F with i-i
the appropriate degrees of freedom.
Unweighted overall mean: X. = £ X.j/c = £ X;./r
A general procedure for calculating sums of
squares for this design will be given below. The
analysis of variance table for these data is presented
in Table 5 and indicates that treatment-strain in- and strain differences would be based on the F-
teraction is significant (P < 0.01). Thus, we have statistics given in the first two rows of Table 5.
shown that the treatment difference is not the same The General Factorial Design
in the two strains, and further analysis now would
be directed toward tests for treatment differences The design described above can be extended
within each strain. To perform this test when there easily to the case in which r treatments are com-
are two treatments per strain, one should form the pared in c strains or populations. Such a design is
following ^-statistic for each strain. The numerator called an r X c factorial (r standing for row classi-
of t is equal to the difference between the treatment fication, c for column classification). A typical lay-
means for that strain. The denominator of t is the out of the resulting means for such a design is given
square root of the product of the mean square in Table 6.
within groups and the sum of the reciprocals of the The computational procedure for an r X c design
sample sizes. when all the sample sizes are equal is straightfor-
Thus, for SH rats, t = (4.24 - 2.41)/ ward and will be a special case of the procedure to
v/0.0597(V8 + Vio) = 15.25. whereas, for WKY rats, be described below. For the case in which r or c is
t = (2.89 - 2.95) / V0.0597C/6 + Vs) = - 0.45. equal to 2, Snedecor and Cochran (1967, pages 483-
The appropriate critical values for the test should 489) give an analysis that can be performed by hand
take into account the fact that two separate com- calculations. To perform an exact analysis for the
parisons were made. Using the Bonferroni method case in which the sample sizes vary and both
of multiple comparisons, we use the critical value r > 2 and c > 2 requires, in general, the use of one
2.36. [The value is obtained from a table of t with of several computer packages— ANOVA of SPSS
28 degrees of freedom (corresponding to the degrees (Nie et al., 1975) or GLM of SAS (1979). However,
of freedom within groups in Table 5) evaluated at several approximate procedures can be performed
a P value of 0.05/2 = 0.025).] In this trial, we reach using hand calculators. Below we discuss one such
the same conclusion as Cutilletta et al., i.e., that procedure, the unweighted analysis of variance (see
differences were significant in SH but not in WKY Snedecor and Cochran, 1967, pages 475-477) appro-
rats. priate for the case in which the sample sizes are not
too unequal. (A ratio of 2:1 between the largest and
If the interaction term were not significant, the smallest sample size is usually considered ade-
overall test of significance for treatment differences quate.)
One problem caused by unequal sample sizes is
TABLE 5 Analysis of Variance for Kidney Renin that of calculating a descriptive measure of the
Concentration Data overall treatment mean taken over all strains (col-
Sums of Degrees of Mean
umns). Simply dividing the overall sum by the
Source of variation squares freedom squares Calculated F number of animals receiving that treatment can
lead to misleading values, as treatments that were
Treatment 5.99 1 5.99 100.3"
Strain 0.619 1 0.619 10.4"
received predominantly by a certain strain would
Interaction 12.68 1 12.68 212.4** reflect unduly the high or low values for that strain.
Within groups 1.671 28 0.0597 — A way of avoiding this problem is to compute the
Source: Cutilletta et al. (1977).
unweighted mean of the mean values for a treat-
** Significant at P < 0.01 level (critical F-value = 7.64). ment across all strains. For example, for the data in

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


STATISTICAL METHODS IN CIRCULATION RESEARCH/Wallensteinetal.

Table 4, the unweighted mean for the SHAM treat- However, in these more complex designs, there is
ment is (2.41 + 2.95)/2 = 2.68. more chance of marked imbalance of sample sizes,
The analysis of unweighted means is based on and the introduction of other complexities not dis-
squaring the difference between each unweighted cussed here. The investigator is urged to consult a
mean and the overall unweighted mean, and then biostatistician before enbarking on such trials.
multiplying this difference by n^, a measure of
"typical" sample size, where Repeat Measurements Studies
The paired t-test is used extensively in the liter-
nb = re / ature in comparing pre- and posttreatment scores
l j-1
on the same group of n animals. Techniques are
and nij is the sample size for row (treatment) i and available in the more general study in which the
column (strain) j . (The reader may recognize nh as same n animals or patients are observed under t, t
the harmonic mean of the sample sizes.) > 2, different conditions, or at t different times.
As above, the analysis of variance partitions the These techniques can be generalized to studies in
total variability into a row effect, a column effect, which different groups each are observed at the t
an interaction term that determines whether differ- time points.
ences between rows are the same for each column, As an example of such a design, we consider a
and a within-group term measuring animal-to-ani- trial by Yellin et al. (1979) in which mitral regurgi-
mal variation. Each term is tested for significance tant orifice areas were compared in five dogs (n =
by comparing its mean square to the mean square 5) at peak flow and at three time points following
for within-group variation. If the interaction term peak flow (t = 4 time points all told). The data are
is significant, the conclusion would be that treat- listed in Table 8. In this example, the main interests
ment differences varied from one strain to another. were in changes from peak flow and in the evalua-
Comparisons between treatments then could be tion of time trends. Before considering the question
performed separately for each strain using the anal- of actual trends, we will analyze the data for any
ysis of variance procedures described above. If the differences between the time points—an analysis
interaction term is not significant, a test for the that would be most appropriate for the case in
effect of treatments (rows) averaged over strains which, instead of time, different treatments or ex-
(columns) would be performed by testing for signif- perimental conditions were being compared in the
icance the ratio of the mean square for treatments same animals.
to the mean square for within groups.
The computations required for the analysis of Test for Overall Differences over Time
variance of unweighted means are given in Table 7.
As noted previously, Table 5 illustrates these com- In the repeated measurements analysis of vari-
putations for the kidney renin data. ance, the total variability is partitioned into differ-
The theoretical assumptions needed for the fac- ences between experimental units, variation over
torial analysis are that the measurements be ob- time, and residual variability. Table 9 indicates the
tained under independent conditions, that the data computational formulas used in the analysis. Under
be distributed normally, and that the variability assumptions to be described below, the hypothesis
within each group be the same. If these latter of no variation over time is rejected if the F-statistic
assumptions do not appear to be realized, transfor- formed by the ratio of the mean square for time to
mations may be employed as described earlier. The the mean square for residual exceeds the critical F
methods described here can be extended to the value with degrees of freedom (t — 1) and (t — 1)
analysis of the joint effect of more than two factors. (n — 1). (The significance of differences between

TABLE 7 Analysis of Variance Table for Two- Way Factorial Design


Source SS df

Row (treatment) i. - x..) 2 r - 1

Column (strain) n h r £ (X.j - X..)2 c - 1


j-i

Interaction I I (Xij - Xi. - x.j + x.) 2 ( r - l ) ( c - 1)


ji ii

Within groups N - re

Abbreviations: Sjj = standard deviation for row (treatment) i, column (strain) j ; riit = sample size for row i, column
j ; nh = harmonic mean; N = total sample size.

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


CIRCULATION RESEARCH VOL. 47, No. 1, JULY 1980

TABLE 8 Mitral Regurgitant Orifice Areas [from Yellin et al. (1979)]


Peak flow

Dog 1 2 3 4 Mean = Xi. Xi. - X.. b,

1 33 42 31 30 34 = X,. 1.25 -6
2 36 43 32 27 34.5 = X2. 1.75 -8
3 36 39 27 23 31.25 = X3. -1.50 -8
4 22 25 16 9 18 = X, -14.75 -8
5 51 60 39 34 46 = X5. 13.25 -13
Mean: X., X.2 X.3 X.4 32.75 = X -8.6 = b
35.6 41.8 29 24.6
X.j - X.: 2.85 9.05 -3.75 -8.15
bi = slope computed from times 2, 3, and 4.

experimental units rarely is tested, because it usu- For example, for hours 2-4 of animal 2 in Table
ally is taken for granted.) For our example t = 4 8,1 = (2 + 3 + 4)/3 = 3, x = (43 + 32 + 27)/3 = 34,
and n = 5, giving a critical value of Fo.os (3, 12) = and b2 = [(43-34)(2-3) + (32-34) (3-3) +
3.49. The analysis for the data in Table 8 is pre- (27-34) (4-3)] / [(2-3) 2 + (3-3) 2 + (4-3)2] =
sented in Table 10 and indicates significant varia- [(9)(-l) + (-2)(0) + (-7)(1)] / 2 = - 8 .
tion across time. For the data at hand, linearity is hypothesized
The test described above theoretically requires for the times, ti = 2, t2 = 3, and t3 = 4, following
that the correlations between all the time points be peak flow. The values of the individual regression
the same, an assumption that rarely is met in prac- coefficients are given in the final column of Table
tice. Greenhouse and Geisser (1954) give a conserva- 8. The mean slope is b = -8.6, the standard devia-
tive procedure which uses the same test statistic tion of the slopes is Sb = 2.6, and the t ratio is t =
but requires a much larger critical value, Fo.os —8.6 V5/2.6 = —7.2, which indicates a highly sig-
(1, n — 1). In this example, the critical value of 3.49 nificant trend toward decreasing values after peak
would be replaced by 7.71. Wallenstein and Fleiss flow.
(1979) give a procedure that is less conservative but
requires interpolation in tables of the i*1 distribution. The Multi-Group Repeated Measurements
It assumes a "damping-out" of the correlations over Design
time. The techniques described above can be extended
to the design in which one experimental group is
Test for Trend observed at t repeated time points following one
Yellin et al. (1979) note that the data following intervention, and other experimental groups are
peak flow for each animal in Table 8 can be de- followed at the same t points following other inter-
scribed by a straight line and suggest that one ventions. Here, interest is focused on the differences
compute the regression coefficient and perform a between interventions with respect to the time
test of significance for each animal. Although the trends, rather than on time trends per se.
results for this study are clear-cut, it is possible to It often is appropriate to summarize the time
envision cases in which only three of five animals trends per experimental group by two quantities:
showed a "significant" trend, preventing the state- the mean of all measurements, and the slope. Com-
ment of firm conclusions. An alternative procedure parison of the means is informative of overall dif-
calls first for fitting a straight line to the data for ferences between groups, and comparison of the
each experimental unit. The mean, b, and the
standard deviation, sb, of the n regression coeffi- TABLE 9 Analysis of Variance for Randomized Block
cients are computed. If there is no time trend, b Design
should be close to zero. If the ^-statistic, t = Source of variation Sum of squares df
b \/n / Sb, exceeds the critical value of t with n — 1
degrees of freedom, then there is evidence of a Row (dog) t | (Xi. - X..)2 n —1
significant time trend.
The fitting of a straight line to data is a standard Column (time) n £ (X.j - X.)2 t - 1
statistical procedure covered in most texts. Assum- j-i

ing that animal i has m pairs of observations, (ti, Residual Subtraction (n- 1)0t - D
xi), (t2, x2), . . . , (tm, xm); where t represents time
and x, response, first calculate the mean time, I, and
the mean response x. Then compute bi, the slope Total £ £ (X, - X..)2 nt - 1

for animal i, using the formula


= | i .£xfj-tnX.. 2
Abbreviations: t = number of time points; n = number of dogs.

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012


STATISTICAL METHODS IN CIRCULATION RESEARCH/ Wallenstein et al.

TABLE 10 Analysis of Orifice Areas in Table 8 Coulson RL, Yardanfar S, Rubeo E, Bove A, Lemole G, Spann
J (1977) Recuperative potential of cardiac muscle following
Analysis of variance table relief of pressure overload hypertrophy and right ventricular
failure in the cat. Circ Res 40: 41-49
Sum of
Source of variation squares df Mean square F Cutilletta AF, Erinoff L, Heller A, Low J, Oparil S (1977)
Development of left ventricular hypertrophy in young spon-
Dog (row) 1600 4 400 taneous hypertensive rats after peripheral sympathectomy.
Time (column) 852.6 3 284.2 31.8" Circ Res 40: 428-433
Residual 107.2 12 8.93 — Dixon WJ, Brown MG (1977) Biomedical Computing Programs,
Total 2559.8 19 P Series. Berkeley, University of California Press
Dunnett CW (1964) New tables for multiple comparisons with a
•• Significant at P < 0.01 level. control. Biometrics 20: 482-491
Glantz, SA (1980) Biostatistics; how to detect, correct, and
prevent errors in the medical literature. Circulation 61: 1-7
slopes is informative of differences in the trends. Greenhouse SW, Geisser S (1954) On methods in the analysis of
Tests for differences between the groups with re- profile data. Psychometrika 24: 95-112
spect to the means and slopes can be computed by Hollander M, Wolfe D (1973) Nonparametric Statistical Meth-
the one-way analysis of variance described above. ods. New York, John Wiley & Sons
Miller, JG (1966) Simultaneous Statistical Inference. New York,
The investigator interested in any possible trend McGraw-Hill
over time (or analyzing a study in which some Nie NH, Hull CH, Jenkins JG, Steinbrenner K, Bent D (1975)
replication is taken on a factor other than time) can SPSS, Statistical Package for the Social Sciences. New York,
perform a repeat measurements analysis of variance McGraw-Hill
Rosen MR, Hoffman BF (1978) Editorial: Statistics, Biomedical
as described by Armitage (1971, page 253). How- Scientists, and Circulation Research. Circ Res 42: 739
ever, as noted above, the validity of the assumptions SAS Institute Inc. (1979) SAS Users Guide, 1979 Edition. Ra-
required for the analysis are questionable, and the leigh, North Carolina
researcher is advised to consult a biostatistician for Scheffe, H (1959) The Analysis of Variance. New York, John
advice in analyzing such trials. This analysis, as Wiley & Sons
Snedecor GW, Cochran WG (1967) Statistical Methods,
well as the analysis for differences between groups ed. 6. Ames, Iowa, The Iowa State University Press
with respect to the means and slopes, also can be Tukey, JW (1949) Comparing individual means in the analysis
performed using program BMDP2V of the BMDP of variance. Biometrics 5: 99-114
series (Dixon and Brown, 1977). Wallenstein, S, Fleiss JL (1979) Repeated measurements analy-
sis of variance when the correlations have a certain pattern.
Psychometrika 44: 229-233
References Yellin EL, Yoran C, Sonnenblick EH, Gabbay S, Frater RWM
Armitage P (1971) Statistical Methods in Medical Research. (1979) Dynamic changes in the canine mitral regurgitant ori-
New York, John Wiley & Sons fice area during ventricular ejection. Circ Res 45: 677-683.

Downloaded from http://circres.ahajournals.org/ at DALHOUSIE UNIVERSITY on December 2, 2012

You might also like