Professional Documents
Culture Documents
Yang 2013
Yang 2013
Yang 2013
RESEARCH
ABSTRACT: The primary purpose of an environmental monitoring program is to provide oversight for microbiolog-
ical cleanliness of manufacturing operation and document the state of control of the facility. Key to the success of
the program is the establishment of alert and action limits. In practice, several statistical methods including normal,
Poisson, and negative binomial modeling have been routinely used to set these limits. However, data collected from
clean rooms or controlled locations often display excess of zeros and overdispersion, caused by sampling population
heterogeneity. Such data render it inappropriate to use the traditional methods to set alert and action levels. In this
paper, a method based on a zero-inflated negative binomial model is proposed for the above instances. The method
provides an enhanced alternative for trending environmental data of classified rooms, and it is demonstrated to show
a clear improvement in terms of model fitting and parameter estimation.
KEYWORDS: Alert and action limits, Environmental monitoring, Zero-inflated model, Negative binomial distribu-
tion, Non-parametric methods, Poisson distribution
LAY ABSTRACT: The primary purpose of an environmental monitoring program is to provide oversight for
microbiological cleanliness of manufacturing operation and document the state of control of the facility. Key to the
success of the program is the establishment of alert and action limits. In practice, several statistical methods including
normal, Poisson, and negative binomial modeling have been routinely used to set these limits. However, data collected
from clean rooms or controlled locations often display excess of zeros and overdispersion, caused by sampling
population heterogeneity. Such data render it inappropriate to use the traditional methods to set alert and action levels.
In this paper, a method based on a zero-inflated negative binomial model is proposed for the above instances. The
method provides an enhanced alternative for trending environmental data of classified rooms, and it is demonstrated
to show a clear improvement in terms of model fitting and parameter estimation.
scribed previously (3). However, the reliability of In this paper, we develop a complementary approach
these non-parametric approaches depends heavily on to setting alert and action limits for classified systems.
the amount of data. In the absence of large data sets, The method is based on a zero-inflated negative bino-
non-parametric approaches run the risk of selecting as mial model discussed by Erdman et al. (7, 8). Micro-
the control limits the outlying data points caused by bial recoveries of a clean room are considered as being
out-of-control conditions, resulting in inflated esti- generated from two stochastic processes. The first
mates of alert and action limits. Therefore it is always process, representing sterile sampled population, gen-
preferable to use model-based methods. erates zero counts, whereas the second process, rep-
resenting sampled population with some tendency to
A more frequently used model for count data of be nonsterile, produces count data from a negative
skewed empirical distribution is the Poisson model (4, binomial model. The chance for each process to pro-
5). The Poisson model assumes that the data are uni- duce the observed results is modelled through a Ber-
formly dispersed. With the Poisson model, the mean of noulli distribution. The method provides a simple and
the distribution is equal to the variance. However, the flexible alternative to the approaches currently in use
real-life environmental data are collected in various for setting alert and action limits, and it has been
locations and time points and are often over-dispersed. shown to provide better model fitting and more accu-
This is primarily caused by heterogeneity of sampling rate estimates to the alert and action limits.
populations, as samples collected at different locations
and times often come from different random pro- Zero-Inflated Negative Binomial Model
cesses. As a consequence, a single-parameter Poisson
model is not adequate to describe environmental data Let X ⫽ {X i , i ⫽ 1, . . . , n} denote the microbial
consisting of several subpopulations and tends to un- counts of n samples taken in a monitoring cycle. X i is
derestimate the variability of the data, thus resulting in model through the following zero-inflated negative
more frequent false alarms of microbial excursion. binomial (ZINB) model:
Hoffman (4) and Christensen et al. (5) suggest using
the negative binomial distribution to correct the effect X i ⫽ WY ⫹ 共1 ⫺ W兲 Z (1)
of overdispersion. The negative binomial is a gener-
alization of the Poisson model. It models data at each where W is Bernoulli random variable with Pr[W ⫽ 1]
sampling location through a Poisson distribution, but ⫽ p, Y is a degenerated random variable with Pr(Y ⫽
it allows the mean of the Poisson distribution to vary 0) ⫽ 1, Z follows a negative binomial distribution
across locations according to a gamma distribution. NB(, ), with density function given by
Because the negative binomial has one more parame-
ter than the Poisson, the second parameter can be used ⌫共1/k ⫹ x i兲共k兲 xi
to adjust the variance independently of the mean, thus g共 x i兩, k兲 ⫽ (2)
x i!⌫共1/k兲共1 ⫹ k兲 1/k⫹xi
accommodating overdispersion.
where ⬎ 0 and k ⬎ 0. The distribution of Z has a
As noted by Caputo and Huffman (6), in Class 100 mean and variance (1 ⫹ k). It is evident that the
(ISO 5) clean rooms and isolation systems, most of variance is greater than mean, whereas the variance
time there is no observation of microbial recovery. and mean are equal for the Poisson distribution. As a
These frequent zero test results make normal, Poisson, consequence, the negative binomial distribution is
or negative binomial based models insufficient. They useful for modeling over-dispersed data. It is also
propose to monitor frequency of non-zero microbial assumed that W, Y, and Z are independent. The excess
recovery, as opposed to trend individual observations. of zeros in the observed data is described by the
However, by focusing the monitoring on the frequency random variable Y, which has a point mass at zero.
of events, one might not fully utilize all information The density function of the observed microbial count
contained in the measurements. For example, the mag- X i is
nitude of intermittent non-zero results is not reflected
in the frequency analysis. Even if there are excursions f共 x i兩p, , k兲
exceeding the alert or action limit, they may go unde-
tected. In addition, the Caputo and Huffman’s method
does not account for heterogeneity of the subpopula-
⫽ 再 共1p ⫹⫺共1p兲⫺g共p兲x 兩,g共 xk兲兩, k兲if
i
i if x i ⫽ 0
xi ⬎ 0 .
Several observations can be made about the zero- Let 1 ⫺ ␣1 and 1 ⫺ ␣2 be the confidence levels of alert
inflated model specified through eqs 1–3. Firstly, when and action limits, respectively, such that 0 ⬍ 1 ⫺ ␣1 ⬍
p ⫽ 0, f( x i 兩p, , k) ⫽ g( x i 兩, k). That is, X i follows 1 ⫺ ␣2 ⬍ 1. In practice, ␣1 and ␣2 are conventionally
a negative binomial distribution. Secondly, when p ⫽ chosen to be 0.05 and 0.01, respectively. Thus the alert
0, k 3 0 and k 3 , f( x i 兩p, , k) ⫽ g( x i 兩, k) 3 limit (ALL) and action limit (ACL) can be determined
x ie⫺x i as
. This implies that X i approximately follows a
xi !
再冘 冎
Poisson distribution with mean microbial count of . n
Lastly, under the above conditions, assuming that is ˆ ⫽ min n:
ALL f̂共 x兩p̂, ˆ , k̂兲 ⱖ 1 ⫺ ␣1 (6)
large, because X i , being a Poisson random variable of x⫽0
mean , can be expressed as sum of independent
再冘 冎
Poisson variables of mean 1, it is approximately nor- n
mally distributed with both mean and variance of . ˆ ⫽ min n:
ACL f̂共 x兩p̂, ˆ , k̂兲 ⱖ 1 ⫺ ␣2 .
As a result, the normal, Poisson, and negative bino- x⫽0
mial distribution can be viewed, either in an exact or
(7)
approximately sense, as special cases of zero-inflated
negative binomial models.
The two limits are the numbers that enable the in-
equalities in eqs 6 and 7 to meet for the first time.
Parameter Estimation
They are the smallest integers that are bigger than or
equal to the 100(1 ⫺ ␣1) and 100(1 ⫺ ␣2) percentiles
Let n x (0 ⬍ x ⱕ m) denote the number of observations
of the distribution of X i , and they can be obtained
out of n samples X ⫽ {X i , i ⫽ 1, . . . , n} that take
through iterative calculation. However, the accuracy
value x. In particular, n 0 is the number of zeros in the
of the two limits depends how closely the MLEs ( p̂,
samples. The log-likelihood function of X is given by
ˆ , k̂) approximate ( p, , k). In general, the larger the
冋 册
sample size n, the more accurate the estimates are.
1
L共 p, , k兲 ⫽ n 0log p ⫹ 共1 ⫺ p兲
共1 ⫹ k兲1/k
Comparison with Other Methods
再
ˆ 共m, n兲 ⫽ ˆ ⫹ 1.96ô 2
p̂ ⫹ 共1 ⫺ p̂兲 g共 x i兩ˆ , k̂兲 if x i ⫽ 0 ALL (8)
⫽ .
共1 ⫺ p̂兲 g共 x i兩ˆ , k̂兲 if x i ⬎ 0
(5) ˆ 共m, n兲 ⫽ ˆ ⫹ 2.33ô 2.
ACL (9)
TABLE I
Mean Estimates of Alert and Action Limits Based on 1000 Simulated Data Sets
For the other three models, the estimates for ALL and The true ALL and ACL are determined to be 5 and 7,
ACL are determined to be the smallest integers greater respectively, based on the density function in eq 3
than or equal to 95th and 99th percentile. Specifically, with model parameters ( p, , k) being replaced by
let f̂( x兩ˆ ) be the estimated probability density func- their true values (0.5, 2.8, 0.04). The results are sum-
tion, which, for instance, takes the functional form in marized in Table I.
eq 5 for the ZINB random variable. The estimates of
ALL and ACL are given by It is evident that the normal and Poisson models un-
derestimate the dispersion of the data, resulting in
再冘 冎
n
narrower alert and action limits. The negative bino-
ˆ 共m, n兲 ⫽ min n:
ALL f̂共x兩ˆ 兲 ⱖ 0.95 (10) mial, presumably appropriate for modeling overdis-
x⫽0 persion, overestimates the data dispersion to account
for the excess of zeros, giving rise to wider limits. By
再冘 冎
n
contrast, the ZINB model provides the most accurate
ˆ 共m, n兲 ⫽ min n: estimates of ALL and ACL. For a moderate sample
ACL f̂共x兩ˆ 兲 ⱖ 0.99 .
x⫽0 size n ⫽ 100, the ALL and ACL are very close to their
respective true limits. The estimated ALL and ACL
(11)
converge to the true limits as the sample size in-
creases. In addition, the precision of the ALL and ACL
The mean ALL and ACL estimates and their SD are
estimates, characterized by SD, improves as the sam-
calculated as follows:
ple size becomes larger. However, the sample size has
little effect on the accuracy of the estimates based on
冘 ALL
1000
the normal, negative binomial, and Poisson models.
ˆ 共m, n兲
The results suggest that when data exhibit excess of
m⫽1
ˆ 共n兲 ⫽
ALL and SDALL 共n兲 zeros and overdispersion, ZINB demonstrates an ex-
1000
冑
cellent performance in estimating ALL and ACL and
that the other three models result in increased risk of
冘 关ALL
1000
ˆ共m, n兲 ⫺ ALL
ˆ共n兲兴 2 either false alarm of microbial excursions or under-
m⫽1 detection of adverse events.
⫽ (12)
1000 ⫺ 1
Figure 1 displays estimated density functions of the
four models imposed on the histogram of the microbial
冘 ACL
1000
ˆ 共m, n兲 data from the first round of simulation (m ⫽ 1 and n ⫽
1). Because the data contain excess of zeros and
ˆ 共n兲 ⫽ m⫽1
ACL and SDACL 共n兲 exhibit overdispersion, none of the normal, negative,
1000
冑
and Poisson models fits the data well, which becomes
冘 关ACL
1000 especially clear when comparing the actual with mod-
ˆ共m, n兲 ⫺ ACL
ˆ共n兲兴 2 eled counts of zero for the normal and Poisson model,
m⫽1 and non-zero observed and modeled counts for the
⫽ . (13)
1000 ⫺ 1 negative binomial. The superiority of the ZINB to the
Figure 2
Procedure used for determining data distribution and appropriate model for estimating alert and action limits.
and action limits, causing high rate of false positive program should include an evaluation process that
or negative outcomes. The above procedure helps allows for testing data distribution and proper selec-
find models suitable for the data. The effectiveness tion of models.
of the method can be evaluated through simulation.
However, a full description of this is beyond the Acknowledgements
scope of this paper. With the help of statisticians,
the method can be automated through R program- We would like to thank Dr. Lanju Zhang for useful
ming. discussion of the manuscript. We also would like to
thank the referees for their helpful comments, which
Conclusions greatly helped improve the paper.
3. Conover, W. J. Practical Nonparametric Statis- els using the COUNTREG procedure. SAS Global
tics, 3rd ed. Wiley: New York, 1999. Forum 2008, Paper 322.
4. Hoffman, D. Negative binomial control limits for 8. Lambert, D. Zero-inflated Poisson regression
count data with extra-Poisson variation. Pharma- models with an application to defects in manufac-
ceutical Statistics 2004, 2 (2), 127–132. turing. Technometrics 1992, 34 (1), 1–14.
5. Christensen, A.; Melgaard, H.; Iwersen, J. Envi- 9. SAS Institute, Inc. SAS/STAT user’s guide, Ver-
ronmental monitoring based on a hierarchical Pos- sion 9. SAS Institute: Cary, NC, 2009.
son-Gamma model. J. Qual. Technol. 2003, 35
(3), 275–285. 10. Long, J. S. Regression Models for Categorical
and Limited Dependent Variables. Sage Publica-
6. Caputo, R. A.; Huffman, A. Environmental mon- tions: Thousand Oaks, CA, 1997.
itoring: data trending using a frequency model.
PDA J. Pharm. Sci. Technol. 2004, 58 (5), 254 – 11. Neelon, B. H.; O’Malley, J.; Normand, S. T. A
260. Bayesian model for repeated measures zero-in-
flated count data with application to outpatient
7. Erdman, D.; Jackson, L.; Sinko, A. Zero-inflated psychiatric service use. Stat. Modelling 2010, 10
Poisson and zero-inflated negative binomial mod- (4), 421– 439.