Professional Documents
Culture Documents
Summary of Statistical Formulae
Summary of Statistical Formulae
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 205
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
206 SUMMARY OF STATISTICAL FORMULAE
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 207
GUIDE TO SUMMARY
Types of distribution
First examine the data and decide which of the four following
main types of problem is involved:
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
208 SUMMARY OF STATISTICAL FORMULAE
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 209
Table 32.
One sample: Comparing Comparing
Type of Size of comparison with two several
distribution sample a standard samples samples
Normal
1,2 Large 3,4 5 35
Small 6 7 35
Binomial
8 Large 9,10 11 20,17
Small (See 20,18,19 20,17
Biometrika
Tables)
Poisson
12 nx large 13 14 16
Single small (See (See 21
values Biometrika Biometrika
Tables) Tables)
Non-normal or unknown
Large or 38,39 40
small (See
Ciba-Geigy,
Scientific Tables)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
210 SUMMARY OF STATISTICAL FORMULAE
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 211
GUIDE TO FORMULAE
1. Formula for the 'normal' (or Gaussian) distribution (See
section 2.1)
The normal curve is
x= l5>, (4)
n
where ^x means that all observations in the sample are to be
added together, repeated values being included as many times as
they appear.
(More elaborately, we could write ^x as
n
x X or
l + 2 + • • • + *n X*'*)
The quantity x is often used to estimate the unknown population
mean \i. The sample variance is given by
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
212 SUMMARY OF STATISTICAL FORMULAE
(7)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 213
d = *-pJL, (13)
where d is a normal variable with zero mean and unit standard
deviation. Significance is judged by reference to Appendix 1.
Thus for a two-sided test the result is significant at the 5 per cent
level if the absolute value of d (i.e. without regard for sign) is
greater than 1.96.
Confidence limits for the unknown population mean pi are
x - ds/\/n to x + dsj\Jn, (14)
where the multiplying factor d is chosen from Appendix 1 accord-
ing to the probability required. Thus if d = 1.96 just 5 per cent of
the total frequency is excluded, and we have a 95 per cent
confidence range.
It is worth noticing that if o happens to be known exactly, we
can use it instead of s in formulae (13) and (14), which are then
valid even for small n.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
214 SUMMARY OF STATISTICAL FORMULAE
d (15)
- ,*Y%
and refer d to Appendix 1 as before.
Confidence limits for the difference, \ix — \i2, between the two
true means are
t =^ 7 ^ * (20)
s/y/n
and refer to Appendix 2.
Note that when this test is applied to 'paired-comparison' tests
\i will usually be zero.
Confidence limits for the unknown population mean JJL must
also be based on 'Student's' f, and are
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 215
where the samples are labelled so that sx2 is greater than s22. We
then refer to tables of the variance ratio (e.g. Appendix 5 in this
book, Fisher & Yates' Statistical Tables or Biometrika Tables for
Statisticians), to find the appropriate value of F for the required
level of significance corresponding to fx = nx — 1 degrees of
freedom in the numerator and f2 = n2 — 1 in the denominator.
Note that in a straightforward comparison of two samples we
usually require a two-sided test and must enter the tables at half
the chosen probability. In applications to the analysis of variance,
on the other hand, we normally need a one-sided test, and
therefore use the significance levels as they stand.
x n2
where
C2 _ S l ( * ~ *l) 2 + X2C* ~ Xl)2 ,_.
s n2 — 2 , (ZZ)
+ 2
and refer to Appendix 2.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
216 SUMMARY OF STATISTICAL FORMULAE
Confidence limits for the difference, \i\ — \i2, between the two
true means are
n2
but now treat this as being distributed approximately like 'Stud-
ent's' t with / degrees of freedom, the latter being given by
1
u2 + (1 - u)2
i - l n 2 - 1
(26)
where
u=
Confidence limits for the difference, \i\ — \i2, between the two
true means are given by formula (16) in 5 above, remembering
that we are here treating rfasa/ with / degrees of freedom. Note
that non-integral values of / must be treated by appropriate
interpolation.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 217
—^—p-q"-, (2)
a\(n - a)\
where a may be any whole number from 0 to n. The factorial x\
means 1.2.3 . . . (x — l)x, and 0! and 1! are both taken to be
unity. The mean of the distribution of a is np, and the variance is
npq (see section 2.5). The probability p may be estimated from
the sample by a/n, which has mean p and variance pq/n.
(11)
(a/n) P
d = ~ , (17)
-PY
n
and refer d to Appendix 1 as before.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
218 SUMMARY OF STATISTICAL FORMULAE
n\ n2 n\ + n2
Then find
d=—k^—_
n\ n2
to
(18a)
n2
where d is chosen from Appendix 1 to correspond to the required
probability.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 219
^L, (3)
X\
x ± J*-. (12)
V n
Confidence limits can be calculated as for normal distributions.
(For small values of Poisson variables, confidence limits can be
obtained from Biometrika Tables for Statisticians.)
nx n2
and referring d to Appendix 1 as before. For this test to be
reasonably reliable, it is as well to have nxxi and n2x2 each larger
than 30.
Confidence limits for the difference mx — m2 between the two
unknown Poisson parameters are
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
220 SUMMARY OF STATISTICAL FORMULAE
) t0 (jEl_jE2) + (
(19a)
where d is chosen from Appendix 1 to correspond to the required
probability.
(To compare two small single observations we can refer to
Biometrika Tables for Statisticians. Note that the table given is
'one-tailed' as it stands.)
i£^ (27)
E
where the summation is over all compartments. Care must be
taken to use the right number of degrees of freedom. For a
contingency table of r rows and c columns this is (r - l)(c - 1).
When fitting a frequency distribution to data grouped into k
classes, the number of degrees of freedom is k — 1 minus the
number of constants estimated from the data (e.g. two for the
normal distribution).
In general, every additional independent constraint removes
one degree of freedom.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 221
Table 6
a b A
c d
B N
n N
(28)
- k) - k)
where k = A/N, 1 - k = B/N, and there are c - 1 degrees of
freedom. The method is valid only if no expected number is less
than about 5.
a + c b + d n
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
222 SUMMARY OF STATISTICAL FORMULAE
20. Homogeneity test for several binomial samples (see section 8.4)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 223
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
224 SUMMARY OF STATISTICAL FORMULAE
n,
(35)
2
5> ,
Remember that the values of some items may be repeated,
especially if calculations are made on grouped data, and must be
counted as many times as they occur. Next calculate the three
quantities in the first line of formula (37) below, and subtract
each of these from the corresponding quantities in the last line of
formula (35) to give the second line of formula (37):
(37)
- y) (39)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 225
can then use the estimate with attached standard error given by
r±1 ^ . (40)
i, 1+ r
\ — r
(42)
1+P
1~P
The difference z — £ is then normally distributed with zero mean
and variance l/(n - 3). Significance is tested as for normal vari-
ables.
Notice that the logarithms on the right of formula (42) are
natural logarithms. We can, however, obtain z (or £) directly,
given r (or p), by reference to Biometrika Tables for Statisticians.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
226 SUMMARY OF STATISTICAL FORMULAE
ru2)(l - r232)]
There are similar formulae for r 1 3 2 and r 2 3 1 . We also have
analogous expressions to formula (67) for the true correlation
coefficients like Pi 2 . 3 , etc.
With four variables represented by 1, 2, 3 and 4, we might
want to go a stage further and eliminate the effects of both 3 and
4. This is done by calculating the coefficient r12 34 given by
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 227
= rnA - r13.4r23,4
V K l ^ X l W
where coefficients on the right-hand side like r12.4 must be
calculated from formulae like formula (67).
Partial correlation coefficients can be tested for significance in
much the same way as total correlation coefficients. We simply
use formulae like formulae (40), (41) and (42) in 24, 25 and 26
above, with the modification that n is replaced by n minus as
many variables have been eliminated from the comparison in
question. For such significance tests to be valid, variables not
eliminated from any comparison should follow a multivariate
normal distribution.
b =
(44)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
228 SUMMARY OF STATISTICAL FORMULAE
- x)(y - y)
2
1 2
V - - y) - • (47)
n —2
b ± (48)
^ \2
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 229
t = bl-b2
V / - \2 V> / - \2
z a lI V* —
xi)
V 1
Z.2KXIV —
x2)
V 1
-
v 7
where
2 _ («i - 2>x 2 + (n2 - 2)s22
nx + n2 - 4
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
230 SUMMARY OF STATISTICAL FORMULAE
u=
Confidence limits for the difference p1 — /32 are obtained as
usual with f-variables.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 231
fitted regression is then
y = a + b1x1 + b2x2, (70)
where
a = y - bxxx - b1x1. (71)
We start by calculating the sums, and sums of squares and
products, according to the scheme
i> 2*2,
(72)
(74)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
232 SUMMARY OF STATISTICAL FORMULAE
2 _ S - bxD - b2E
s ~~ . (77)
n-3
The standard errors of the two estimated regression coefficients
are
AC- B2
and (78)
s
b, =
AC - B2
y = or /3 2 x 2 (79)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 233
(82)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
234 SUMMARY OF STATISTICAL FORMULAE
where \ (53)
C = G2/N,
and
2r,-2/»i"C. (54)
The summation in formula (53) is over all observations and in
formula (54) over all k varieties.
The analysis of variance is shown in Table 16.
Table 15
"2
Xi
k » Xknk nk xk
Table 16
Source of Sum of Degrees of Mean
variation squares freedom squares
Total N- 1 -
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 235
(58)
u \ (59)
v
where '
C = G2/bt,
and
/ - c. (60)
The analysis of variance table is as in Table 21.
The block and treatment effects are tested by examining the
variance-ratios MB/s2 and MT/s2.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
236 SUMMARY OF STATISTICAL FORMULAE
Table 20
Block
Treatment Treatment
Treatment 1 total mean
xlb
Table 21
Source of Degrees of Mean
variation Sum of squares freedom square
Treatment 2iT*/b - C t- 1
Blocks b — 1 MB
Residual By subtraction (t - l)(b - 1) s2
Total 2V - c bt- 1
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 237
f (65)
where
fi = nt - 1 and / =
Then find
—|/log 1 0 5 2 -
where (66)
C = 0.4343| 1 + —
[ 3(k - 1) I " ^ f
The quantity in the first line of formula (66) is then distributed
approximately like x1 with k — \ degrees of freedom.
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
238 SUMMARY OF STATISTICAL FORMULAE
39. Wilcoxon's signed rank sum test for a single sample (see
section 15.4)
A distribution-free test that is more powerful, in the paired-
comparison situation, than the sign test of section 38 is Wilcox-
on's test for a single sample. We first put all the observations in
ascending order of magnitude, ignoring the signs. Zero values are
rejected altogether, and the remaining non-zero values are as-
signed the ranks 1 to n. If any observations are numerically equal
(or tied) they are each assigned an average rank calculated from
the ranks that would otherwise have been used.
Then calculate the sum T of the ranks attached to observations
that are in fact positive. Appendix 6 then gives the extreme
values of T, for all n ^ 25, that would have to be attained or
exceeded for significance to occur.
For larger values of n use the approximation given by
(88)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 239
where
v = n(n + l)(2n + l)/24, (89)
and we can take d to be normally distributed with zero mean and
unit standard deviation. When tied ranks occur, each group of t
tied ranks reduces v by (t3 — t)/48.
40. Wilcoxon's rank sum test for two samples (see section 15.5)
Suppose that there are two samples to be compared, comprising
the n values JC1? x2, . . ., xn and the m values y l9 y2, . . ., ym. We
wish to make a distribution-free test of the hypothesis that the ;t's
and y's have identical distributions, but are primarily interested
in differences of location, i.e. differences between mean values.
We first combine the observations from both samples, put
them in ascending order of magnitude and assign ranks, making
sure that the x and y components can be easily identified later.
Tied values are treated as in 39 above.
We then calculate the sum T of the ranks of all the x's.
Significance is determined by reference to tables showing the
extreme values of T that have to be attained or exceeded. (See
Ciba-Geigy's Scientific Tables, which cover the values n ^ 25,
m ^ 50. Note that in these tables n = A^, m = N2.)
A convenient approximation for values of n and m beyond the
tables is given by
d= {T - \n(n + m + l)}/y/v, (90
where
v = nm(n + m + 1)/12, (91)
and we can take d to be normally distributed with zero mean and
unit standard deviation.
When ties occur we must write v as
v = nmjN* -N-R)
12N(N - 1)
where
N = n + m, (93)
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
240 SUMMARY OF STATISTICAL FORMULAE
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019
SUMMARY OF STATISTICAL FORMULAE 241
Downloaded from https://www.cambridge.org/core. University of Sussex Library, on 17 Apr 2018 at 09:28:55, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1017/CBO9781139170840.019