Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

3

FREQUENCY DISTRIBUTIONS

Replicate analyses on a single sample, or analyses of replicate samples, will always show a
variation among results. A variable quantity can be either continuous, that is it can assume
any value within a given range, or discrete, that is it assumes only whole number (integer)
values. Continuous variables are normally measurements (e.g. height and weight, pH val-
ues, chemical composition data, time to obtain a particular change during incubation),
although an effective discontinuity is introduced by the limitations of measurement (e.g.
length to the nearest 0.1 mm, acidity to the nearest 0.01pH unit, etc.). Discrete variables
are typified by whole number counts, for example the number of bacteria in a sample unit,
the number of insects in a sack of corn, etc.
When a large number of measurements or counts has been done, the observations can be
organized into frequency classes to derive a frequency distribution. Since counts are discon-
tinuous, each class in the frequency distribution will be an integer or a range of integers and
the number of observations falling into that class will be the class frequency. When more
than one integer is combined, the classes must not overlap and, although not essential, it is
usual to take equal class intervals in simple frequency analyses.
Although the form of a frequency distribution can be seen from a tabulation of numeri-
cal data, it is more readily recognized in a histogram (i.e. a bar chart), where the areas of the
rectangles are proportional to the frequency. If the class intervals are equal, then the height
of each column in a bar chart is proportional to the frequency. An example of the deriva-
tion of frequency distributions is given in Example 3.1. The histograms shown in Figs. 3.1
and 3.2 illustrate the effect of changing the relative positions of the class boundaries on the
apparent shapes of the frequency distributions.

Statistical Aspects of the Microbiological Examination of Foods


Copyright © 2008 by Academic Press. All rights of reproduction in any form reserved. 13

CH003-N53039.indd 13 5/26/2008 4:41:09 PM


14 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

(a) Class intervals 0.005 g

6
Frequency (f )

0
0.94

0.95

0.96

0.97

0.98

0.99

1.00

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1.08

1.09
(b) Class intervals 0.010 g (c) Class intervals 0.010 g

12 12

10 10

8 8
Frequency (f )

6 6

4 4

2 2

0 0
0.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

0.945

0.965

0.985

1.005

1.025

1.045

1.065

1.085

Weight of water (g) delivered by pipette

FIGURE 3.1 Frequency distribution histograms for data (from the section Variability in Delivery from a 1 ml
(1 cm3) Pipette in Example 3.1) for the delivery of distilled water from a 1-cm3 pipette; the mean value is 0.984 g
and the standard deviation is 0.028 g.

CH003-N53039.indd 14 5/26/2008 4:41:09 PM


FREQUENCY DISTRIBUTIONS 15

11 (a) Class interval  1


10

7
Frequency (f )
6

0
4

10

12

14

16

18

20

22

24

26
22
(b) Class interval  3
20

18

16

14
Frequency (f )

12

10

0
4–6

7–9

10–12

13–15

16–18

19–21

22–24

25–27

Number of bacteria/field

FIGURE 3.2 Frequency distribution of microscopic counts of bacteria having a mean count of 12.77 bacteria/
field and a variance of 24.02(from the section Variability in Numbers of Bacterial cells Counted Microscopically in
Example 3.1).

CH003-N53039.indd 15 5/26/2008 4:41:09 PM


16 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

For a frequency distribution the arithmetic mean value (x) can be derived using the formula:
x  ∑ fX / n , where X is the mid-point value of the frequency class and f is the frequency of val-
ues within the same class. Hence, for the distribution shown in Fig. 3.1(c) x  ∑ [(0 . 950  3) 
(0 . 960  10)  (0 . 970  9)  ...  (1 . 090  1)]/ 50  49 . 21/ 50  0 . 9842
The unbiased estimate of the population variance (s2) is given by:

s2  ( ∑ (fX 2 )  x ∑ f X ) (n  1)

For x  0 . 9842, x ∑ f X  49 . 21 and ∑ (f X 2 )  3(0 . 95)2  10(0 . 96)2  9(0 . 97)2  ... 
(1 . 09)2  48 . 272 . Hence, the estimate of variance is given by s2  (48 . 472  0 . 9842
(49 . 21))/ 49  0 . 0008065 and the standard deviation is given by s  '0 . 0008065 
0 . 0284 .
Thus, for the data cited, the mean weight of water delivered from a 1-cm3 pipette was
0.984 g, and the standard deviation was 0.0284 g. (Using the method illustrated in Example
2.1, the calculated mean value was 0.983 g and the standard deviation was 0.0287 g.)

EXAMPLE 3.1 DERIVATION OF THE MEAN, VARIANCE AND


FREQUENCY DISTRIBUTION

Variability in delivery from a 1 ml (1 cm3) pipette

A pipette was used to transfer 1 cm3 of distilled water into a tared weighing boat. The
weight of water delivered was determined on an analytical balance. The experiment was
performed 50 times and gave the following results (g water):
0.948, 1.012, 1.085, 1.063, 1.010, 1.000, 0.994, 0.986, 0.995, 0.999, 0.969, 0.965, 0.945,
0.977, 0.957, 0.946, 0.960, 0.955, 1.010, 0.965, 0.975, 0.972, 0.957, 0.961, 0.975, 0.988,
0.989, 0.974, 0.980, 0.980, 1.001, 0.977, 1.021, 1.051, 0.965, 0.963, 0.971, 0.983, 0.962,
0.984, 0.978, 0.968, 0.960, 1.027, 0.959, 0.985, 0.985, 0.967, 0.960, 0.992.

The data are organized in distribution frequency classes as shown below (Table 3.1 and
Fig. 3.1).

CH003-N53039.indd 16 5/26/2008 4:41:10 PM


CH003-N53039.indd 17
TABLE 3.1
Arrangement of Data in Frequency Classes

Frequency (f) Frequency (f) Frequency (f)


with interval with interval Frequency with interval
Frequency class 0.005 g Frequency class 0.020 g Frequency class class 0.010 g
boundaries (g) (Fig. 3.1(a)) boundaries (g) (Fig. 3.1(b)) boundaries (g) mid-point (X) (Fig. 3.1(c))

0.9400–0.9449 0 0.9400–0.9499 3 0.9450–0.9549 0.950 3


FREQUENCY DISTRIBUTIONS

0.9450–0.9499 3 0.9500–0.9599 4 0.9550–0.9649 0.960 10


0.9500–0.9549 0 0.9600–0.9699 12 0.9650–0.9749 0.970 9
0.9550–0.9599 4 0.9700–0.9799 8 0.9750–0.9849 0.980 9
0.9600–0.9649 6 0.9800–0.9899 9 0.9850–0.9949 0.990 7
0.9650–0.9699 6 0.9900–0.9999 4 0.9950–1.0049 1.000 4
0.9700–0.9749 3 1.0000–1.0099 2 1.0050–1.0149 1.010 3
0.9750–0.9799 5 1.0100–1.0199 3 1.0150–1.0249 1.020 1
0.9800–0.9849 4 1.0200–1.0299 2 1.0250–1.0349 1.030 1
0.9850–0.9899 5 .. . 1.0450–1.0549 1.050 1
0.9900–0.9949 2 .. . 1.0550–1.0649 1.060 1
0.9950–0.9999 2 1.0500–1.0599 1 .. . .
1.0000–1.0049 2 1.0600–1.0699 1 .. . .
1.0050–1.0099 0 .. . 1.0850–1.0949 1.090 1
1.0100–1.0149 3 .. .
1.0150–1.0199 0 1.0800–1.0899 1
1.0200–1.0249 1
1.0250–1.0299 1
1.0300–1.0349 0
.. .
.. .
1.0500–1.0549 1
1.0550–1.0599 0
1.0600–1.0649 1
.. .
1.0850–1.0899 1
1.0900–1.0949 0
17

5/26/2008 4:41:10 PM
18 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Variability in numbers of bacterial cells counted microscopically (Data of


Ziegler and Halvorson (1935): their appendix I, slide 1)

One hundred microscope fields on a single slide were examined and the number of bacte-
ria counted per field was determined. The frequency distributions of the data are derived
from the following:

No. No.
bacteria/ Frequency bacteria/ Frequency
field (x) (f ) field (x) (f )

4
5
6
2
4
3
冎 9
16
17
18
4
7
7
冎 18

7
8
9
8
5
7
冎 20
19
20
21
4
1
2
冎 7

10
11
12
11
7
4
冎 22
22
23
24
0
3
1
冎 4

13
14
15
8
6
5
冎 19
25
26
27
0
1
0
冎 1

n  100

fX
x ∑ n
 {(9  5)  (20  8)  (22  11)  (19  14)  (18  17)  (7  20)  (4  23)  (1  26)} / 100
 12 . 77

where x  mean value, X  mid value of the frequency classes (i.e. 5, 8, 11,…) and
f  the class frequency value.

s2 
∑ f ( x )  x ∑ fx  18685  12.77(1277)  24.0173
2

n1 99

s  24 . 0173  4 . 9 007

The frequency distributions are shown in Fig. 3.2.

CH003-N53039.indd 18 5/26/2008 4:41:10 PM


FREQUENCY DISTRIBUTIONS 19

TYPES OF FREQUENCY DISTRIBUTION

Mathematically defined frequency distributions can be used as models for experimental


data obtained from any population. Assuming that the experimental data fits one of the
models then, amongst other things:
1. The spatial (ecological) dispersion of the population can be described in mathematical
terms.
2. The variance of population parameters can be estimated.
3. Temporal and spatial changes in density can be compared.
4. The effect of changes in environmental factors can be assessed.
The mathematical models used most commonly for analysis of microbiological data include
the Normal (Gaussian), Binomial, Poisson and Negative Binomial distributions, which are
described below. The parameters of the distributions are summarized in Table 3.2.

STATISTICAL PROBABILITY

Probability is about chance and the likelihood that an event will, or will not, occur in any
specific situation. For instance, the probability that a specific person will win the jackpot in
the National Lottery is very low (about 1 in 14 million) because the odds against are most
improbable – yet people do win the Lottery showing that no matter how improbable an

TABLE 3.2
Some Continuous and Discrete Distribution Functions

Probability density Restriction on


Name Domain function fx parameters Mean Variance

Normal a
  x   ⎡ 1 2⎤   m   m 2
(Gaussian) (1/ )
2 exp ⎢ ((x  m)/  ) ⎥
⎢⎣ 2 ⎥⎦ 0

Binomial xs  s, for 0  p  1q  1p np npq


s  0,1,2,…,n ( ns ) ps (1  p)ns
Poissonb xs  s, for 0m m m
s  0,1,2,…, em m s / s!
Negative xs  s, for ⎛⎜ ⎞ n n 0 and nP nPQ
⎜⎝ n  ss  1⎟⎟⎟⎠ p (1  p)
s
binomial s  0,1,2,…, 0  p  1 (p  1/Q
and 1p  P/Q)

a
The Normal distribution is the limiting form of the Binomial distribution when n →  and p → 0.
b
Limiting form of binomial, as p → 0, q → .

CH003-N53039.indd 19 5/26/2008 4:41:11 PM


20 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

event there is always a chance that it will occur. By contrast, the chance of getting snow in
winter if you live in Norway, Russia or Canada is very high – one might say it is certain to
occur – but if you live in England the chance is not very high.
If a normal coin is tossed once it will fall to show either a ‘head’ or a ‘tail’ and there is a
50% probability for obtaining a head (or a tail) in a single throw. This is written p  0.50,
q  0.50 where (p  q)  1; p is the probability that an event will occur (i.e. of obtaining a
‘head’) and q the probability of failure (i.e. that a head will not occur and conversely that a
‘tail’ will occur).
It is important to recognize that since there are two mutually independent ways of obtain-
ing either a head (H) or a tail (T), then for two or more coins we could get either H or T for
each coin. Hence if first coin falls as an H, the second or subsequent coins could also fall
either H or T, that is the following sequences could occur: HH, HT, TH, TT. Therefore, by
tossing two coins there are 4 possible outcomes: HH (1 in 4 chances or P  0.25), HT and
TH (which are the same; 2 in 4 chances or P  0.50) or TT (1 in 4 chances or P  0.25). If
we toss three coins there are 8 possible outcomes: HHH (0.125), HHT (including HTH and
THH; 0.375), HTT (including THT and TTH; 0.375) or TTT (0.125).
We can determine the number of outcomes quite simply in this case because for a single
toss of the coin the probability for obtaining one head is 0.5; with two coins the probability
of obtaining HH  0.5  0.5  (0.5)2  0.25 (1 in 4); with three coins the probability of
HHH is (0.5)3 that is 0.125 or 1 in 8; and so on. We can generalize these probabilities by
saying that the relative probability of a specified outcome for any given number of trials (n)
is determined by Pn, where P  the probability of that single independent event occurring
in a single trial. Pascal’s triangle (Fig. 3.3) provides a simple ‘ready reckoner’ for the any
number of independent trials where the event is either positive or negative.
Provided that the outcome of one event (A) does not affect the outcome of a second
event (B), then the events are totally independent; however, if event A can affect the pos-
sible outcome of event B then the events are not independent. Some events are mutually

No of trials Possible outcomes

1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1

FIGURE 3.3 Pascal’s Triangle – a visual illustration of binomial outcomes from a series of trials, for instance
the toss of a coin. In a single trial there are 2 possible outcomes: a head or a tail, with equal probability (P  0.5).
In 2 trials there are 4 possible outcomes: 2 heads (P  0.25), 1 head and 1 tail (P  0.5) or 2 tails (P  0.25). In
3 trials there are 8 possible outcomes and in 4 trials there are 16 possible outcomes, etc. Note that each value in
a line is the sum of the values immediately above it. The figure can be expanded by adding data for additional
trials.

CH003-N53039.indd 20 5/26/2008 4:41:11 PM


FREQUENCY DISTRIBUTIONS 21

exclusive, for instance, a ‘normal’ coin has both a head and a tail, so when a spun coin falls
it can show either a head or a tail – it cannot show both.
Suppose that we have a bag containing 20 balls: 4 red (R), 6 blue (B) and 10 green (G).
The probabilities of randomly drawing either 1R, or 1B or 1G are 4/20 (0.2), 6/20 (0.3)
or 10/20 (0.5), respectively. Assuming that after each draw the ball is returned to the bag,
the events are totally independent so that probability that we can draw sequentially 1R, 1B
and 1G balls is: P(R)P(B)P(G)  P(R傽B傽G)  0.2  0.3  0.5  0.03. Note that since the
events are totally independent the overall probability is the multiple of the individual prob-
abilities, as described in the ‘Multiplication Rule’.
Suppose further that half of the balls in each colour is marked with an odd number
(O) and the remainder with an even (E) number, and that we wish to draw a ball that
is either red or odd but not both. Then P(R)  0.2 and P(O)  0.5. These events are not
mutually exclusive, so the combined probability is given by P(R艛O)  P(R)  P(O) 
0.2  0.5  0.7. Note that these events are not mutually exclusive, so the ‘Addition Rule’
applies. But if we want to draw either a red ball, or an odd ball, but not a red–odd ball,
then the probabilities are again dependent upon the ‘Addition Rule’: P(R艛O) P(R) 
P(O)  P(R傽O), where P(R傽O) signifies the probability for both events to occur. Thus,
P(R艛O)  0.2  0.5  0.1  0.6.
Let us extend this concept. A pack of playing cards consists of 52 cards divided into 4
suits (spades, hearts, diamonds and clubs) each of which contains cards numbered from 1
to 10 plus a jack, a queen and a king. If we shuffle the pack and then draw the top card the
independent chance of drawing an Ace is 1 in 13 (because there are 4 aces in the 52 cards)
but the chance of drawing the Ace of Spades is only 1 in 52. If we shuffle the cards and lay
the top four cards on the table, what is the chance that any one of the cards laid down will
be an Ace? We have already determined that the chance of picking any one specific value
card is 4 in 52, so if the first card is not an ace, then the chance that the second card will be
an ace is now 4 in 51 (because 1 card has already been drawn and the individual chances
are independent of each other) so the cumulative probability is 4/52  4/51  16/2652 ⬇
1/166. Similarly if card 2 is not an ace, then the probability for an ace as card 3 is
4/50 and for card 4 the chances of an ace are 4 in 49. So the overall chance of finding
1 ace amongst 4 cards is 4/52  4/51  4/50  4/49  256/6,497,400 ⬇ 1/25,380.
However, if the first card had been an ace, the chance that the second card would also be
an ace is now reduced to 3/51. Hence, the chance that all 4 cards are aces will be 4/52  3/
51  2/50  1/49  24/6,497,400 ⬇ 1 in 270,000.

THE BINOMIAL DISTRIBUTION (2  )

If we toss a number of coins the average probability of equal numbers of heads and tails
is p  q  0.5, but if all coins were ‘double-headed’, the probability of a ‘head’ occurring
would be p  1.0 (i.e. there would be no chance of obtaining a ‘tail’ and q  0). We can
therefore use the concept of probability to answer the general question: ‘What is the chance

CH003-N53039.indd 21 5/26/2008 4:41:12 PM


22 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

that a specific event will occur?’ The probability scale ranges from P  0 for impossible
events to P  1 for certain events.
In the binomial distribution, p is the probability that an event will occur and q is the prob-
ability that the event will not occur. If p and q remain constant in each of a given number (n)
of individual independent trials, then since p  q  1, the probability series is described by
the general expression (p  q)n. The individual terms are given by the binomial expansion:

⎛ n! ⎞
Px  ( n
x )( q ) ⎜⎜ x ! (n  x)! ⎟⎟⎟⎟ ( q(nx) px )
( n x) p x  ⎜
⎝ ⎠

where Px is the probability of finding x individuals in a sample, n is the number of times


the test is repeated and n! means factorial n (e.g. 5!  5  4  3  2  1  120). The
factorial term merely counts the number of mutually exclusive ways to obtain a par-
ticular outcome. The population parameters of mean () and variance (2) are given by
  np and 2  npq.
The binomial distribution is used as a model when a specific characteristic of an indi-
vidual in a sample can be recognized (e.g. the prevalence of defective samples in a lot).
The distribution is often used as the basis for drawing up sampling schemes in Acceptance
Sampling (qv) of foods and other materials. In such schemes, q is defined as the prob-
ability that any one sample will not be defective (e.g. that it will not be contaminated),
p is the probability that the sample will be defective (e.g. will be contaminated) and n is
the number of samples tested. The expected probabilities, derived from the expansion of
(p  q)n, can be calculated (Example 3.2) or obtained from Tables of Binomial Probability,
such as those given by the National Bureau of Standards (1950); Fisher and Yates (1974)
and Pearson and Hartley (1966).

EXAMPLE 3.2 CALCULATION OF EXPECTED FREQUENCIES FOR A


POSITIVE BINOMIAL DISTRIBUTION

In a sample of 100 farmed trout, the mean level of Clostridium botulinum spores detected
was 2 spores/fish. It is widely believed that the maximum likely prevalence of contami-
nation is 10 spores/fish (n  10). What is the frequency distribution and what are the
chances of not detecting the organism?
We can use the binomial distribution to determine the probability (P) that no con-
tamination is detectable (P(x0)), or that 1, 2, …,10 spores will be detected (P(x1), P(x2),
…, P(x10)). To do this, a sample estimate (p̂) of the overall population value in relation to
the maximum contamination level expected, is derived as follows:
The probability of contamination is given by

x 2
pˆ    0 . 2; hence qˆ  1  pˆ  0 . 8
n 10

CH003-N53039.indd 22 5/26/2008 4:41:12 PM


FREQUENCY DISTRIBUTIONS 23

Expected probabilities are given by the expansion of

n! ⎡⎢ qˆ (nx) pˆ x ⎤⎥
(pˆ  qˆ ) n  (0 . 2  0 . 8)10 i.e. P(x) 
x ! (n  x)! ⎣ ⎦

For the values given, the probability of detecting no Cl. botulinum spores in a fish is

10!
P(x0)  (0 . 8100 0.20 )  0 . 810  0 . 1073
0!(10  0)!

The probability of detecting 1 spore/fish is

10 !
P(x1)  (0 . 8101 0 . 21 )  10  0 . 89  0 . 21  0 . 2684
1 ! (10  1)!

The probability of detecting 2 spores/fish is

10!
P(x2)  (0 . 8102 0 . 22 )  45  0 . 88  0 . 22  0 . 3020
2 ! (10  2)!

The probabilities for 3, 4, 5, 6, 7, 8, 9 and 10 spores/fish are derived similarly.


The total probability ⬄1.0. The expected frequencies of occurrence of Cl. botulinum
spores in 100 fish are given by f  P(x)N.

x(spores/fish) P(x) f  NP(x) f (as integer)*

0 0.1074 10.74 11
1 0.2684 26.84 27
2 0.3020 30.20 30
3 0.2013 20.13 20
4 0.0881 8.81 9
5 0.0264 2.64 3
6 0.0055 0.55 1
7 0.0008 0.08 0
8 0.0001 0.01 0
9 ** – 0
10 ** – 0
Total 1.0000 100.00 101

* That is to the nearest whole number.


** 0.0001.
Hence with a mean contamination level of 2 spores/fish, and a probable maximum of 10
spores/fish, the probability of not detecting any Cl. botulinum spores would be 11/100, or
slightly more than 1 in 10. This theme is developed further in Chapters 5 and 8 in rela-
tion to sampling schemes and the use of presence or absence tests for specific organisms.

CH003-N53039.indd 23 5/26/2008 4:41:12 PM


24 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Figure 3.4 shows the probability distribution for different values of the probability for
a successful test (p) and different numbers of trials (n). As the values of n and p increase
the shape of the binomial distribution approaches the bell-shaped curve, described as the
Normal (or Gaussian) distribution. This is the distribution which often describes continu-
ous variables (i.e. measurements rather than counts).

THE NORMAL DISTRIBUTION

The Normal distribution refers to a family of distributions that have the same generic
shape: they are symmetrical curves with more values concentrated in the centre of the
curve and fewer in the tails (Fig. 3.5). The shape of the curve is described by the general
expression:

(x)2
1 
f ( X)  e 2 2
2 2

where f(X)  the probable density of the derived variable X (the standard normal deviate)
with X  (x  )/  where x  the random variable (i.e. observed value),   population
mean and   standard deviation of the population mean.
An estimate of the value of the standard normal deviate (X) can be determined from
sample data (x) using Z  (x  x)/ s . In other words, by subtracting the mean value (x)
from each observed value (x) and dividing by the estimate of the standard deviation (s) a
series of standardized deviates (Z) is derived. These can be obtained from Tables of the
Standardized Normal Deviate (Pearson and Hartley, 1966).
The Central Limit Theorem states that if a large number of random variables (i.e. sam-
ples) is selected from (almost) any distribution, with mean  and variance 2 then the means
of these samples will themselves follow a normal distribution with a mean  x and a stand-
ard deviation  x , which is the standard error of the mean, with  x  / n . Hence, any
distribution will approach the normal distribution as the sample size increases, provided
that the mean and variance are independent of one another. As will be seen below, distribu-
tions such as the Poisson and the Binomial tend to approach normality when the number of
samples tested tends to infinity.
A unique property of the normal distribution is that the mean and variance are inde-
pendent and the shape of the distribution is a function of the population variance param-
eter. From Fig. 3.5 it can be seen that 95.45% of observations occur within2 standard
deviations () from the mean () and that 99.73% lie within 3 from the mean.
The normal distribution is rarely a suitable model for microbiological data as measured,
but it is very important because many parametric statistical tests are based on the normal
distribution. Such tests include analysis of variance (ANOVA) and tests for significance of
differences. Thus, for microbiological data, steps have to be taken to transform the data
such that they become ‘normally’ distributed.

CH003-N53039.indd 24 5/26/2008 4:41:12 PM


FREQUENCY DISTRIBUTIONS 25

p  0.1 p  0.2 p  0.3 p  0.4 p  0.5


n5 n5 n5 n5 n5
60

50
Frequency (%)

40

30

20

10

0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
x x x x x

p  0.5 p  0.5
n  10 n  20
50 50
Frequency (%)

40 40

30 30

20 20

10 10

0 0
0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18 20
x x

30 p  0.1
n  20 25 p  0.3
25
n  20
20 20

15 15

10 10

5 5

0 0
0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18 20

FIGURE 3.4 Binomial probability distributions for various values of p and n in the expansion (p  q)n.

CH003-N53039.indd 25 5/26/2008 4:41:13 PM


26 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

40

30
% frequency (fx)

20

10

0
3 2 1 0 1 2 3
xm
X
σ
95.45%
99.73%

1
e(x)
2 /2
FIGURE 3.5 A Normal (Gaussian) Distribution Curve described by the equation f (x) 
where   population mean with variance 2. 2 2

THE POISSON DISTRIBUTION (2  )

The arithmetic mean of the discrete binomial distribution is given by   np and the
variance by 2  npq  q. Since q  1  p  1  (/n), then 2  q  (1  (/n)) 
(2/n). Hence if n is finite, the variance will always be less than the mean (2  ).
When the probability that an event will occur is low (p → 0) and n approaches infin-
ity (n → ) with np fixed and finite, then the binomial distribution approaches the discrete
Poisson distribution. Since  2  (1  (/ n)) , as n →, /n → 0, and 2 → . The Poisson
distribution is described by the equation Px  e ( x / x !), where Px is the probability that
x individuals occur in a sampling unit,  is the Poisson parameter (    2) and e is
the exponential value (2.7183). Unlike the binomial distribution, which is a function of
two parameters (n and p), only one parameter () is needed for the Poisson series, since
    np. The Poisson parameter () is estimated by the mean value (m) where m  s2,
thus Px  em (m x / x !) .

CH003-N53039.indd 26 5/26/2008 4:41:13 PM


FREQUENCY DISTRIBUTIONS 27

The probabilities of 0, 1, 2, 3, etc. individuals per sampling unit are given by the indi-
vidual terms of the expansion of this equation, thus:

P(x0)  em

m
P(x1)  em  P(x0) (m)
1!

m2 ⎛m⎞
P(x2)  em  P(x1) ⎜⎜ ⎟⎟⎟
2! ⎜⎝ 2 ⎠

m3 ⎛m⎞
P(x3)  e3  P(x2) ⎜⎜ ⎟⎟⎟ , etc.
3! ⎜⎝ 3 ⎠
m
This can be generalized as P(x1)  P(x)
.
(x  1)
The individual terms can be calculated (Example. 3.3) or may be obtained from standard
Tables; for example Pearson and Hartley (1966) give individual terms of the Poisson distri-
bution for different values of m from 0.1 to 15.0. The expected frequency (nPx) is obtained
by multiplying each term of the series (Px) by the number of sampling units (n).

EXAMPLE 3.3 CALCULATION OF THE EXPECTED FREQUENCIES OF


A POISSON DISTRIBUTION

It is intended to inoculate 1000 bottles of meat slurry with a spore suspension at an


average level of 10 spores/bottle. What are the expected frequencies for (a) less than 1
spore, (b) less than 5 spores and (c) more than 15 spores/bottle?
For the intended mean inoculum level  m  10 with N  1000, the probability of 0
spores/bottle (i.e. 1 per bottle) is

P(x0)  em  e10  0 . 0000454

Hence the expected frequency  NP(x)  0.0454.


An alternative way to express this would be that only 1 in 22,000
bottles would be expected not to contain at least one spore.
The succeeding terms of the Poisson series are used to calculate the remaining
probabilities.

CH003-N53039.indd 27 5/26/2008 4:41:14 PM


28 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

For simplicity, the generalized equation of P(x1)  Px · m/(x  1) is used:

P(x0)  em  e10  0 . 000454


m 10
P(x1)  P(x0)
 0 . 0000454
 0 . 000454
x 1 1
10
P(x2)  0 . 000454
 0 . 00227
2
10
P(x3)  0 . 00227
 0 . 00 7 57
3
TABLE 3.3
Individual terms (x  0 … 24) of a Poisson Distribution for m  10 and n  1000,
Where f is the Expected Frequency of Occurrence of a Spore Inoculum (for Details See
Example 3.3). The Values of Px and f are Shown to Four Significant Places

X Px Px f  nPx f (as integer)


10
P(x0)  e 0.0000454


0 0.04540 0
1 P(x1)  P(x0)
10/1 0.0004540 0.4540 0
2 P(x2)  P(x1)
10/2 0.002270 2.270 2 29
3 P(x3)  P(x2)
10/3 0.007566 7.567 8
4 P(x4)  P(x3)
10/4 0.01892 18.92 19
5 P(x5)  P(x4)
10/5 0.03783 37.83 38
6 P(x6)  P(x5)
10/6 0.06306 63.06 63
7 P(x7)  P(x6)
10/7 0.09008 90.08 90
8 P(x8)  P(x7)
10/8 0.1126 112.6 113
9 Etc. 0.1251 125.1 125
10 0.1251 125.1 125
11 0.1137 113.7 114
12 0.09478 94.78 95
13 0.07291 72.91 73
14 0.05208 52.08 52
15 0.03472 34.72 35


16 0.02170 21.70 22
17 0.01276 12.76 13
18 0.007091 7.091 7
19 0.003732 3.732 4
20 0.001866 1.866 2 49
21 0.0008891 0.8891 1
22 0.0004039 0.4039 0
23 0.0001756 0.1756 0
24 0.00007317 0.07317 0
Total NP(x)  999.90717

From the data in Table 3.3, the cumulative probability of less than 5, and more than 15,
spores/bottle would be P  0.291 and P  0.0484, respectively. Hence, in 1000 replicate
inoculated bottles the expected frequencies would be 29 and 49, respectively; Thus 922 of
the 1000 bottles would be expected to contain between 5 and 15 spores/bottle.

CH003-N53039.indd 28 5/26/2008 4:41:14 PM


FREQUENCY DISTRIBUTIONS 29

Since the Poisson distribution is associated with rare events which can be considered to
occur randomly, for example the distribution of industrial accidents over a long time period
or the distribution of small numbers of bacteria in a large quantity of food, then tests for
agreement with a Poisson distribution will be tests for randomness of distribution. We have
noted above that the Poisson distribution is a special version of the binomial distribution
where p → 0 as n →  and 2 → .
Certain conditions must be met if the Poisson series is to be used as a mathematical
model for bacterial counts in a food sample:

1. The number of individual organisms per sampling unit (k) must be well below the maxi-
mum possible number that could occur (k → ).
2. The probability that any given position in the sampling unit is occupied by an organism
is both constant and very small (constant p → 0); consequently, the probability that that
position is not occupied by a particular organism is high (q → 1).
3. The presence of an individual organism in any given position must neither increase nor
decrease the probability that another organism occurs near by.
4. The sizes of the samples must be small relative to the whole population.

The first condition implies that food samples showing high levels of contamination, or cul-
ture plates with large numbers of colonies, might be expected not to conform to a Poisson
distribution but the Poisson tends to the normal distribution when  is large. The second
condition implies that there must be an equal chance that any one organism will occur at
any one point in the food sample or the culture. This condition is fulfilled only if the indi-
viduals are distributed randomly. The third criterion implies that if bacterial cells have rep-
licated then more than one organism will occur within a given location, hence randomness
is unlikely once replication occurs.
In a broth culture or in a well-mixed suspension of a liquid food, such as milk, the total
volume occupied by 1,000,000 (106) bacterial cells is only about 1 part in 106 of the total
volume of the liquid, hence it is reasonable to assume that the cells will occur randomly. In
a solid food sample (e.g. a minced meat) contamination occurs randomly throughout and
the condition would be met; but if the surface of a piece of meat is contaminated more than
the deep tissues, randomness might apply only after total maceration of the sample in dilu-
ent. However, if contamination were relatively light, the distribution of organisms could
be random as judged by use of some suitable surface sampling technique. When individual
organisms are not well separated the variance will be less than the mean (s2  x) and the
binomial might be a more suitable model for the distribution.
Tests to determine whether the Poisson distribution provides a good description of a set
of data are given in Chapter 4. If clumping of organisms occurs, the third condition will not
be met and it is probable that variance will be greater than the mean (s2 x) . In theory,
the removal of a sample from a finite population will affect the value for P in the next sam-
ple unit. However, if the sample forms only a minute proportion of the total ‘lot’ size, then
this effect will be minimal and the value of P would not alter significantly from one exami-
nation to the next.

CH003-N53039.indd 29 5/26/2008 4:41:15 PM


30 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

THE NEGATIVE BINOMIAL DISTRIBUTION (2 )

If the second and third conditions for use of the Poisson distribution are not fulfilled, the
variance of the population will usually be greater than the mean (2 ). This is particu-
larly the case in microbiology where aggregates and cell clumps occur both in natural sam-
ples and in dilutions, slide preparations, etc. Of the various mathematical models available,
the negative binomial is frequently the best model to describe the distribution frequencies
obtained (Jones et al., 1948; Bliss and Fisher, 1953; Gurland, 1959; Takahashi et al., 1964;
Dodd, 1969) but other complex distributions may be appropriate.
The negative binomial, which describes the number of failures before the xth success
when n is the integer, is the mathematical counterpart of the (positive) binomial, and is
described by the expansion of (q  p)k, where, q  1  p and p  /k. The parameters of
the equation are the mean  and the exponent k. Unlike exponent n in the binomial series,
the exponent k of the negative binomial is neither an integer nor is it the maximum possible
number of individuals that could occur in a sample population. Instead it is related to the
spatial or temporal distribution of the organisms in the sample and takes into account the
effects of cell clumps and aggregates.
The variance of the population is given by  2  kpq  q   (1  ( k)) 
  ( 2 k) . Therefore, the reciprocal of the constant k (i.e. 1/k) is a measure of the excess
2

variance or clumping of individuals in the population. As 1/k → 0 and k → , the distribution


converges to Poisson, with 2  . Conversely, if clumping is dominant, k → 0 and therefore
1/k → ; the distribution converges on the logarithmic distribution (Fisher et al., 1953).
Applications of the negative binomial distribution within biological sciences are numer-
ous. One such was the demonstration that the numbers of microbial colonies in soil follow
a Poisson distribution and the numbers of organisms within colonies follows a logarithmic
distribution, so that the distribution of all organisms in soil conform to a negative binomial
(Jones et al., 1948).
The individual terms of the expansion of (q  p)k are given by:
k
⎛  ⎞ ⎛ (k  x  1)! ⎞⎟ ⎛⎜  x ⎞⎟
Px  ⎜⎜1  ⎟⎟⎟ ⎜⎜ ⎟⎜ ⎟
⎜⎝ k ⎠ ⎜⎝ x ! (k  1)! ⎟⎟⎠ ⎜⎝   k ⎟⎟⎠

where Px is the probability that x organisms occur in a sample unit. As in other distribu-
tions, the expected frequency of a particular count is NPx, where N is the number of sam-
ple units.
A very simple method of deriving an approximate value for the k can be obtained by
rearranging the equation for variance of a negative binomial  2    ( 2 / k), hence
k   2 ( 2  ) . Since the statistical estimates of the parameters  and 2 are x and
s2, respectively, an estimate (k̂) of the value of k can be derived from k̂  x 2 /(s2  x).
Anscombe (1950) showed that this method is inefficient because it does not give a reliable
estimate for values of k̂ below 4 unless x is also less than 4.

CH003-N53039.indd 30 5/26/2008 4:41:15 PM


FREQUENCY DISTRIBUTIONS 31

Estimations of the population parameters  and k are obtained from the frequency dis-
tribution statistics x and k̂ , the arithmetic mean (x) being derived in the normal way
(Example. 2.1). Determination of k̂ is much more complex. Several methods have been
proposed (Anscombe, 1949, 1950; Bliss and Fisher, 1953; Debauche, 1962; Dodd, 1969) to
obtain an approximate estimate, which can then be used in a maximum likelihood method
to obtain an accurate estimate for k̂ (Example 3.4).

EXAMPLE 3.4 CALCULATION OF k̂ AND DERIVATION OF THE


EXPECTED FREQUENCY DISTRIBUTION OF A
NEGATIVE BINOMIAL FOR DATA FROM MICROSCOPIC
COUNTS OF BACTERIAL CELLS IN MILK (DATA OF
MORGAN ET AL., 1951)

The observed frequency distribution of bacterial cells per field in a milk smear was:

Number of cells (x)

0 1 2 3 4 5 6 7 8 9 10 10 Total

F 56 104 80 62 42 27 9 9 5 3 2 1 400
fx 0 104 160 186 168 135 54 63 40 27 20 10 967
Ax 344 240 160 98 56 29 20 11 6 3 1 0

Where x  the cell count, f  observed frequency of that count and Ax is the cumulative
frequency of counts exceeding x. The total number (N) of sample units (i.e. fields counted)
is given by N   f  400 and the total of Ax   fx  967 .
The arithmetic mean count is given by x  { fx/  f }  967 / 400  2 . 4175 , and the
variance by s2  { fx2  x fx}/(n  1)  {3957  2 . 4175(967)}/ 399  4 . 0583

Estimation of kˆ (Method 1)

This simple method is based on the equation for variance of a negative binomial. The pop-
ulation variance is given by:  2    (  2 k ) and therefore k   2 ( 2   ) .Substituting
the sample statistics for the population parameters we get kˆ  x 2 (s2  x ). The method
is not very efficient for values of k  4 but provides an approximation that can be used in
other methods.
For our data,

x2 (2 . 4175)2
kˆ1    3 . 5619  3 . 6
s2  x 4 . 0583  2 . 4175

CH003-N53039.indd 31 5/26/2008 4:41:15 PM


32 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

kˆ 3 . 5619
Test for efficiency    1 . 47 which is less than 6 and
x 2 . 4175

(kˆ  x )(kˆ  2) (3 . 5619  2 . 4175)(3 . 5619  2)


  13 . 76
x 2 . 4175
which is less than 15.
Hence the value for kˆ is not efficient.

Estimation of kˆ by the maximum likelihood method

The maximum likelihood equation is

⎛ x⎞ ⎛ A ⎞
N log e ⎜⎜1  ⎟⎟⎟  ∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠.
⎜⎝ kˆ ⎟⎠

where N  the total number of sampling units (i.e. microscopic fields examined), Ax is the
total number of counts exceeding x and loge is the natural logarithm.
Different values of kˆ are tried until the equation is balanced by iteration.
We solve each side of the equation, initially using the approximation of kˆ1  3 . 6
(derived by Method 1). Solving first the left hand side of the equation:

⎛ x ⎞⎟ ⎛ 2 . 4175 ⎟⎞

N log e ⎜⎜1  ⎟⎟⎟  400 log e ⎜⎜1  ⎟  2 0 5 . 496
⎜⎝ ˆ
k1 ⎟⎠ ⎜⎝ 3 . 6 ⎟⎠

Solving the right hand side of the equation:

⎛ A ⎞ Ax0 A A A A
∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠  kˆ
 x1  x2  x3  …  x10
kˆ  1 kˆ  2 kˆ  3 kˆ  10
344 240 160 98 … 1
       205 . 84
3.6 4.6 5.6 6.6 13 . 6
Using kˆ1  3.6, the 2 equations differ by 0.344 (i.e. the left hand side is less than the
right hand side).
We now select a larger value (e.g. 5.0) for kˆ3 and again solve the two sides of the
equation:

⎛ x ⎞⎟ ⎛ 2 . 4175 ⎞⎟

N log e ⎜⎜1  ⎟⎟⎟  400 log e ⎜⎜1  ⎟  1 5 7 . 76
⎜⎜⎝ ˆk ⎟
3⎠
⎜⎝ 5 . 0 ⎟⎠

⎛ A ⎞ 344 240 160 98 … 1


∑ ⎜⎜⎜⎝ kˆ x x ⎟⎟⎟⎟⎠  5

6

7

8
 
15
 156 . 51

CH003-N53039.indd 32 5/26/2008 4:41:16 PM


FREQUENCY DISTRIBUTIONS 33

For this trial value of kˆ3 , the difference between the two sides is 1.65.
We then arrange the data as follows:

K Difference

0.34
kˆ1 3.6


2
? 0.00

kˆ3 5.0 1.65

Then,

kˆ2  3 . 6 0  (0 . 34) 0 . 34


   0 . 171
5.0  3.6 1 . 65  (0 . 34) 1 . 99

Hence,
kˆ2  3 . 6  1 . 4  0 . 171  0 . 239 and kˆ2  3 . 6  0 . 239  3 . 84

The distribution of these counts can therefore be described by the statistics:

x  2 . 1475; s2  4 . 0583; and k  3 . 84

The negative binomial distribution curve

The distribution curve can be derived from the probability function equation:

kˆ
x ⎞⎟ ⎛⎜⎜ (kˆ  x  1)! ⎞⎟⎟ ⎛⎜ x ⎞⎟
x

P(x) ⎜
 ⎜⎜1  ⎟⎟ ⎜ ⎟⎟ ⎜⎜ ⎟⎟
⎝ kˆ ⎟⎠ ⎜⎜⎝ x!(kˆ  1)! ⎟⎠ ⎝ x  kˆ ⎟⎠

The probability of 0 bacteria/field is given by:

3 .84
⎛ 2 . 4175 ⎞⎟ ⎛ (3 . 84  0  1) ! ⎞⎟ ⎛ 2 . 4175 ⎞⎟0
P(x0)  ⎜⎜1  ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟
⎜⎝ 3 . 84 ⎠ ⎜⎝ 0 ! (3 . 84  1) ! ⎠ ⎜⎝ 2 . 4175  3 . 84 ⎟⎠
3 .84
⎛ 2 . 4175 ⎞⎟
 ⎜⎜1  ⎟⎟
⎜⎝ 3 . 84 ⎠

The calculation is simplified by taking logs of both sides,


⎛ 2 . 4175 ⎞⎟
log Px0  3 . 84
log ⎜⎜1  ⎟  3 . 84
log (1 . 6296)
⎜⎝ 3 . 84 ⎟⎠
 0 . 8144

therefore Px0  antilog (0 . 8144)  0 . 153

CH003-N53039.indd 33 5/26/2008 4:41:16 PM


34 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

The expected frequency of zero cell counts is: NP(x0)  400  0.153  61.2
The probability of 1 bacterial cell/field is given by:

k 1
⎛ x ⎞ ⎛ (k  1  1)! ⎞⎟ ⎛⎜ x ⎞⎟
P(x1)  ⎜⎜1  ⎟⎟⎟ ⎜⎜ ⎟⎟ ⎜ ⎟⎟
⎜⎝ k ⎠ ⎜⎝ 1 ! (k  1)! ⎠ ⎝⎜ x  k ⎠
⎛ (k  1  1)! ⎞⎟ ⎛ x ⎞⎟1
 Px0
⎜⎜ ⎟ ⎜⎜ ⎟
⎝⎜ 1 ! (k  1)! ⎟⎠ ⎝⎜ x  k ⎟⎠
1
k ⎛⎜ x ⎞⎟
 Px0
⎜⎜ ⎟⎟
1 ⎝ x  k⎠

Hence, Px1  (0.153) (3.84) (0.3863)1  0.227. The expected frequency of a count of 1
bacterial cell/field is: NP(x1)  400  0.227  90.8

The probability of 2 bacterial cells/field is given by:

⎛ (k  2  1)! ⎞⎟ ⎛ x ⎞⎟2
P(x2)  Px0
⎜⎜ ⎟⎜ ⎟
⎜⎝ 2 ! (k  1)! ⎟⎠ ⎜⎜⎝ x  k ⎟⎠
⎛ (k  1)(k) ⎞⎟ ⎛ x ⎞⎟2
 Px0
⎜⎜ ⎟⎟ ⎜⎜ ⎟
⎜⎝ 2 ⎠ ⎜⎝ x  k ⎟⎠

⎛ 4 . 84 . 3 . 84 ⎞⎟
Hence, Px2  (0 . 153) ⎜⎜ ⎟⎟ (0 . 3863)2  0 . 212
⎜⎝ 2 ⎠

The expected frequency of a count of 2 bacterial cells/field is: NP(x2)  400  0.212  84.8

This process is continued until  P(x)  1 and  f  400, as illustrated in Table 3.4. A
plot of the observed and expected frequencies is given in Fig. 3.6.

x ⎞⎟ ⎛⎜⎜ (kˆ  x  1)! ⎞⎟⎟ ⎛⎜ x ⎞⎟


kˆ

Note 1: In the equation P(x) ⎜
 ⎜1  ⎟⎟ ⎜ ⎟⎜ ⎟⎟ it is not necessary to derive
⎜⎝ k ⎠ ⎜⎜⎝ x ! (kˆ  1)! ⎟⎟⎠ ⎜⎝ x  kˆ ⎟⎠

⎛ (kˆ  x  1) ! ⎞⎟
the factorials in the second component. The component ⎜⎜⎜ ⎟⎟ simplifies as follows:
⎜⎜⎝ x ! (kˆ  1) ! ⎟⎟⎠

⎛ (kˆ  0  1)! ⎞⎟ ⎛ (kˆ  1)! ⎞⎟


⎜⎜ ⎟⎟  ⎜⎜ ⎟⎟  1
For x  0, ⎜⎜ ⎜
⎜⎝ 0 ! (kˆ  1)! ⎟⎟⎠ ⎜⎜⎝ 1(kˆ  1)! ⎟⎟⎠

⎛ (kˆ  1  1) ! ⎞⎟ ⎛ (kˆ  1) ! ⎞⎟
⎜⎜ ⎟⎟  ⎜⎜ ⎟⎟  kˆ
For x  1, ⎜⎜ ⎜
⎜⎝ 1 ! (kˆ  1) ! ⎟⎟⎠ ⎜⎜⎝ 1(kˆ  1) ! ⎟⎟⎠

CH003-N53039.indd 34 5/26/2008 4:41:17 PM


FREQUENCY DISTRIBUTIONS 35

⎛ (kˆ  2  1) ! ⎞⎟ ⎛ (kˆ  1)(kˆ)(kˆ  1) ! ⎞⎟ (kˆ  1)kˆ


⎜⎜ ⎟⎟  ⎜⎜ ⎟⎟ 
For x  2, ⎜⎜ ⎜
⎜⎝ 2 ! (k  1)! ⎟⎟⎠ ⎜⎜⎝
ˆ 2(kˆ  1)!
⎟⎟
⎠ 2

⎛ (kˆ  3  1)! ⎞⎟ ⎛ (kˆ  2)(kˆ  1)(kˆ)(kˆ  1)! ⎞⎟


For x  3, ⎜⎜ ⎟⎟  ⎜⎜ ⎟⎟
⎜⎜⎜⎝ 3 ! (kˆ  1)! ⎟⎟⎠ ⎜⎜⎜⎝ 6(kˆ  1)!
⎟⎟

ˆ ˆ
(k  2)(k  1)k ˆ

6

Note 2: Computer algorithms are now available for fitting negative binomial models to
experimental data. The ‘goodness-of-fit’ between the observed and calculated distribu-
tions (e.g. Table 3.4) can be tested using 2 (see Chapter 4). It is essential that in all
calculations to derive k, P(x), etc. at least seven significant figures are retained in the cal-
culator to avoid ‘rounding’ errors.

TABLE 3.4
Individual Terms for Values of x from 0 to 10, for a Negative Binomial Distribution
with x  2.4175, n  400 and k  3.84 (for Details See Example 3.4)

Probability of Calculated Frequency Observed


occurrence frequency as integer frequency
Value x Px (nPx) (nPx) (f )

0 P(x0)  (1(x/k))k Ra  0.1528 61.20 61 56


1 P(x1)  R (k/1)(Yb)  0.2269 90.79 91 104
2 P(x2)  R[(k  1)(k)/2!](Y)  0.2129 84.88 85 80
3 P(x3)  R[(k  2)(k  1)(k)/  0.1603 63.84 64 62
3!](Y)
4 etc.  0.1058 42.17 42 42
5 etc.  0.06395 25.55 26 27
6  0.03628 14.54 15 9
7  0.01962 7.90 8 9
8  0.01022 4.13 4 5
9  0.005165 2.10 2 3
10  0.0025471 1.04 1 2
10 1 1
N P  0.9949 398.16 400 400

a
R  (1  x / k)k .
b
Y  (x /(x  k)) x .

CH003-N53039.indd 35 5/26/2008 4:41:17 PM


36 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Negative binomial
120

Number of occurrences
100

80

60

40

20

0
0 1 2 3 4 5 6 7 8 9 10
Bacterial cells/field

FIGURE 3.6 Negative binomial distribution showing the expected distribution (line)
and the observed (histogram) frequency distribution of counts of bacterial cells/field.

The approximate value for k̂ can be substituted in a maximum likelihood equation:

⎛ x⎞ ⎛ Ax ⎞⎟
N log e ⎜⎜⎜1  ⎟⎟⎟  ∑ ⎜⎜⎜⎝ ˆ ⎟⎟
⎝ kˆ ⎟⎠ k  x ⎟⎠

where N is the total number of sample units, loge is the natural logarithm and
Ax is the accumulated frequency (i.e. the total number of counts) exceeding x.
Different approximations for k̂ are tried and the equation is balanced by iteration. The
method is illustrated in Example 3.4.
Tables of expected probabilities for 1480 negative binomial distributions, covering values
of k from 0.1 to 200, have been published (Williamson and Bretherton, 1963). The tables
are arranged in order of increasing size of a parameter ‘p’ that is equivalent to 1/q (not p as
used in the present discussion). The discrepancy arises because they used an alternative form
[pk(1  q)k] of the negative binomial equation where p  q  1 (whereas in this text
the term q  p  1 is used). An estimate of Williamson and Bretherton’s ‘p’ is given by:
p  1(1  x / k) .
An example of a frequency distribution for a negative binomial is given in Fig. 3.4 that
shows a comparison of the observed and calculated frequency distributions.

CH003-N53039.indd 36 5/26/2008 4:41:17 PM


FREQUENCY DISTRIBUTIONS 37

RELATIONSHIP BETWEEN THE FREQUENCY DISTRIBUTIONS

The parameters of the various distributions are summarized in Table 3.5. Elliott (1977)
illustrated the general relationships among the binomial family distributions as:

Binomial family

Binomial (s 2  x) Negative binomial (s 2 x )


(p  q)n (q  p)k

k→ k→ k →0

‘Normal’ Poisson Logarithmic series


(s 2 and x are independent) (s 2  x )
( p  q  0.5) (p → 0, q → 1)
Reproduced from Elliott (1977), by permission of the Freshwater Biological Association

This effect can be seen by comparison of the curves in Fig. 3.4 and Figs. 3.7–3.9. The
shapes of both the binomial (Fig. 3.4) and the negative binomial frequency distribution
curves, with   10 and k  1000 (Fig. 3.9) are very similar to that of a Poisson distribu-
tion with   10 (Fig. 3.7). It can also be seen that the binomial is asymmetric for low
values of p (or q) (Fig. 3.4), that negative binomial curves are asymmetric for low values
of  and k (Figs. 3.8 and 3.9), and that Poisson curves are asymmetric for low values of 
(Fig. 3.7).

TRANSFORMATIONS

Whenever it is required to make comparisons of data (e.g. tests for the standard differ-
ence between mean values), the parametric test methods require that the data conform to a
normal distribution, that is the variance of the sample should be independent of the mean
and the components of the variance (i.e. the variances due to actual differences between the
samples and those due to random error) should be additive.
The binomial distribution approximates to a normal distribution when the number of
sample units is large (n 20) and the variance is greater than 3. Since s2  npq, the normal
approximation can be used when p  0.4–0.6 and n is greater than 12, or when p  0.1–0.9
and n 33. The normal approximation cannot be used if n  12.

CH003-N53039.indd 37 5/26/2008 4:41:18 PM


38 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

The Poisson distribution is asymmetric for low values of its parameter  (estimated
by m  x  s2) but approaches the Binomial when  is large and the Binomial itself
approaches the normal distribution when n is large (Fig. 3.7). The normal approximation
to Poisson can be used when  is generally > 10. For small values of k, the negative bino-
mial distribution is asymmetric but it approaches normality for large values of k when the
mean () is also large (e.g.   10, k  1000 in Fig. 3.9).
The first condition of normality (i.e. symmetrical distribution of values around the mean)
can be attained by all three distributions in certain circumstances, so that some methods

14
k  3.0 k  3.0
m  1.0 m  4.9
12

50 10
Frequency (%)

Frequency (%)

40 8

30 6

20 4

10 2

0 0
0 2 4 6 0 2 4 6 8 10 12 14 16 18 20
x x

14
k  3.0
m  9.5
12

10
Frequency (%)

0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
x

FIGURE 3.7 Negative binomial distributions for k  3.0 and for various mean values () based on expansion of
the formula (q  p)k.

CH003-N53039.indd 38 5/26/2008 4:41:18 PM


FREQUENCY DISTRIBUTIONS 39

10
k  1.9

Frequency (%)
8 m  10
6
4
2
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
x

10
Frequency (%) k5
8 m  10
6
4
2
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26
x

10
k  10
Frequency (%)

8 m  10
6
4
2
0
0 2 4 6 8 10 12 14 16 18 20 22
x

12
10
Frequency (%)

k  50
8 m  10
6
4
2
0
0 2 4 6 8 10 12 14 16 18 20
x

20
k  1000
m  10
15
Frequency (%)

10

0
0 2 4 6 8 10 12 14 16 18 20
x

FIGURE 3.8 Negative binomial distributions for   10 and values of k from 1.9 to 1000 based on the
expansion of the formula (q  p)K.

CH003-N53039.indd 39 5/26/2008 4:41:18 PM


40 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

40 20

Frequency (%)
λ1 λ5
20 10

0 0
0 2 4 0 2 4 6 8 10

12

λ  10 λ  20
Frequency (%)

6 6

0 0
0 2 4 6 8 10 12 14 16 18 10 12 14 16 18 20 22 24 26 28 30 32
Number of individuals (x) per sampling unit

FIGURE 3.9 Poisson distribution series for values of from 1 to 20. The frequency of each count is expressed
as percentage of total number of counts.

associated with the normal distribution (e.g. standard error of the mean and confidence
limits) may then be applied. However, as mean and variance increase together in all three
distributions, the condition requiring independence of mean and variance can never be
fulfilled. Consequently, standard methods intended to answer questions such as ‘Does the
arithmetic mean value of one set of colony counts differ from that of a second set?’ and
other parametric tests cannot be done without risk of introducing considerable errors.
The problem can be overcome by transforming the data, using an appropriate mathematical
model such that the distribution frequency is normalized (see Fig. 3.10) and the interdependence
of mean and variance is removed. Plotting the mean against the variance on a log-log scale, can
provide an assessment as to whether or not the mean and variance of the original and trans-
formed data are independent. If, as in Figs 9.1–9.3, the log variance increases with increasing
mean values, the mean and variance are not independent. Transformation also results in the
components of variance becoming additive, thereby permitting application of analysis of vari-
ance. The choice of transformation to be used is governed by the frequency distribution of the
original data. In many routine operations, the number of sample units may be too small to
permit the data to be arranged in a frequency distribution. In such circumstances, the relation-
ship between the mean value (x) and the variance (s2) of the data can be used as a guide in the
choice of a suitable transformation (Table 3.5).

CH003-N53039.indd 40 5/26/2008 4:41:18 PM


FREQUENCY DISTRIBUTIONS 41

TABLE 3.5
Transformation Functions

Original distribution
Transformation –
Known Not known Replace ‘x’ with Special conditions

Poisson s2  (x) x No counts 10

Poisson s2  (x) x  0.5 Some counts 10

Binomial s2  (x) sin1 x


Negative binomial ( x  0 . 375) k 5
sinh1
( k  2(0 . 375))

Negative binomial log (x  (k/2)) 5 k 2


Not known s2 (x) log x No zero counts
Not known s2 (x) log (x  1) Some zero counts

Source: Modified from Elliott (1977).

Although in some microbiological situations the distribution of microorganisms conforms


to Poisson, in most circumstances, additional method-based components may affect the pure
Poisson sampling variance such that the distribution conforms either to a lognormal or a nega-
tive binomial distribution. For routine purposes one can be reasonably confident that a loga-
rithmic transformation will be appropriate. That is to say that the data value x is replaced by
a value y, where y  log x or y  log (x  1) depending upon whether or not any zero counts
are involved (Table 3.5).
Occasionally it may be necessary to back transform the derived arithmetic mean value to
the original scale (see calculation of geometric mean, Example. 2.1), although transformed
values are frequently cited in microbiological texts as, for instance, log cfu/g. If back trans-
formation is required, it is essential that the transformation is totally reversed, that is for
the x  0 . 5 transformation of Poisson data, square the transformed value and subtract
0.5; for log (x  1), take the antilog and then subtract 1 (although this correction is usually
insignificant).
Transformation of data is an essential requirement for most parametric statistical analy-
sis of quantitative data obtained in microbiological analysis. Non-parametric procedures
offer alternative means of data analysis, since such methods are by definition distribution-
free, but they may be unreliable since they make certain assumptions about the shape and
dispersion of the distribution, which must be the same for all the groups compared.

CH003-N53039.indd 41 5/26/2008 4:41:19 PM


42 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

(a)

100

50

0
0 2 4 6 8 10 12 14 16 18 20
Count (x )

(b) (i)
Frequency (f) (%)

100

50

0
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
k
Transformed count log x 
2

(b) (ii)

100

50

0
0.0 0.2 0.3 0.4 0.5 1.0 1.2 1.4
Transformed count (log(x 1))

FIGURE 3.10 Frequency distributions of microscopic counts of bacterial cells/field (a) before and (b) after
transformation of data. Transformation (b)(i) used the formula log(x  k/2) and transformation (b)(ii) used
log(x  1), where x is the actual number of bacteria/field.

CH003-N53039.indd 42 5/26/2008 4:41:19 PM


FREQUENCY DISTRIBUTIONS 43

EXAMPLE 3.5 TRANSFORMATION OF A NEGATIVE BINOMIAL


DISTRIBUTION

The data cited in Example 3.4 are transformed using (a) log (x  (k/2)) using the best
value calculated for k and (b) log (x  1), which assumes that the distribution is unknown
but that s2 x . The original frequency distribution, with x  2 . 4175 and kˆ  3 . 915 ,
and the derived frequencies (a) and (b) are:

Observed Mean log (x  (k/2)) log (x  1)


frequency (f) (x ) (a) (b)

56 0 0.292 0.000
104 1 0.471 0.301
80 2 0.597 0.477
62 3 0.695 0.602
42 4 0.775 0.699
27 5 0.842 0.778
9 6 0.901 0.845
9 7 0.952 0.903
5 8 0.998 0.954
3 9 1.040 1.000
2 10 1.108 1.104
1 19 1.321 1.301

A comparison of the frequency curves is given in Fig. 3.6; note that the asymmetry is
markedly reduced by transformation.

References

Anscombe, FJ (1949) The statistical analysis of insect counts based on the negative binomial distribu-
tion. Biometrika, 5, 165–173.
Anscombe, FJ (1950) Sampling theory of the negative binomial and logarithmic series distributions.
Biometrika, 37, 358–382.
Bliss, CI and Fisher, RA (1953) Fitting the binomial distribution to biological data and a note on the
efficient fitting of the negative binomial. Biometrics, 9, 176–200.
Debauche, HR (1962) The structural analysis of animal communities in the soil. In Murphy, PW (ed.)
Progress in Soil Zoology. Butterworth, London, pp. 10–25.
Dodd, AH (1969) The theory of disinfectant testing with a mathematical and statistical section, 2nd
edition. Swifts (P and D) Ltd, London.
Elliott, JM (1977) Some methods for the statistical analysis of samples of benthic invertebrates.
(2nd edition). Freshwater Biological Association Scientific Publication No. 25. Ambleside,
Cumbria, UK.

CH003-N53039.indd 43 5/26/2008 4:41:20 PM


44 STATISTICAL ASPECTS OF THE MICROBIOLOGICAL EXAMINATION OF FOODS

Fisher, RA, Corbett, AS and Williams, CB (1953) The relation between the number of species and the
number of individuals in a random sample of an animal population. J. Animal Ecol., 12, 42–58.
Fisher, RA and Yates, F (1974) Statistical tables for biological, agricultural and medical research, 6th
edition. Longman Group, London.
Gurland, J (1959) Some applications of the negative binomial and other contagious distributions.
Amer. J. Pub. Health, 49, 1388–1399.
Jones, PCT, Mollison, JE and Quenouille, MH (1948) A technique for the quantitative estimation of
soil microorganisms. Statistical note. J. Gen. Microbiol., 2, 54–69.
Morgan, MR, MacLeod, P, Anderson, EO and Bliss, CI (1951) A sequential procedure for grading
milk by microscopic counts. Storrs Agricultural Experimental Station Bulletin No. 276.
National Bureau of Standards (1950) Tables of Binomial Probability Distribution. Applied
Mathematics Series No. 6, Washington, DC.
Pearson, ES and Hartley, HO (1966) Biometrika Tables for Statisticians, 3rd edition. University Press,
Cambridge.
Takahashi, K, Ishida, S and Kurokawa, M (1964) Statistical consideration of sampling errors in total
bacteria cell count. J. Med. Sci. Biol., 17, 73–86.
Williamson, E and Bretherton, MH (1963) Tables of the Negative Binomial Probability Distribution.
Wyman, New York.
Ziegler, NR and Halvorson, HO (1935) Application of statistics to problems in bacteriology. IV
Experimental comparison of the dilution method, the plate count and the direct count for the
determination of bacterial populations. J. Bacteriol., 29, 609–634.

CH003-N53039.indd 44 5/26/2008 4:41:20 PM

You might also like