P ms1434 Rev

A Bayesian Approach to Change Point
Estimation in Multivariate SPC

RONG PAN
Arizona State University, Tempe, AZ 85287
STEVEN E. RIGDON
Southern Illinois University Edwardsville, Edwardsville, IL 62026
A Bayesian procedure is developed to estimate the time of a change in the process mean vector for
a multivariate process given that an out-of-control signal was raised on a multivariate control chart. In
addition, we can infer simultaneously which variable(s) had a change in mean value, when the change
occurred, and the value of the changed mean. All three problems (change point time, variables that
shifted, and new values for the shifted variables) are addressed in a single statistical model. Markov chain
Monte Carlo (MCMC) methods, through the software WinBUGS, are used to estimate parameters of the
change point models. To identify the mean shift in a process with more than two variables, we propose a
branch-and-bound search algorithm so that MCMC can be carried out with a predictable computing time in
each search step. A simulation study shows that the Bayesian approach has similar performance compared
to the maximum likelihood estimation (MLE) in terms of identifying the true change point location when a
noninformative prior is assumed; however, it can perform better when proper prior knowledge is incorporated
into the estimation procedure. The Bayesian approach provides full posterior distributions for the model
and change point, which can contain information that is not available in a likelihood analysis.
Key Words: Markov Chain Monte Carlo; MEWMA; Process Mean Shift Model; Search Algorithm.
Problem Description that is not possible in a likelihood-based analysis.

Finally, the Bayesian method can make use of prior
W HENa multivariate control chart, such as the
T chart or the multivariate exponentially
2
weighted moving average (MEWMA) control chart

information about the parameters.
Control charting can be broken down into two

raises an out-of-control signal, several questions arise
parts, or phases. In Phase I, we collect data for
that may aid in the identification of the cause. When
the purpose of estimating parameters and eventu-
did the change occur? Which variables shifted? What
ally constructing a control chart. In this phase, we
are the new mean values of the shifted variables?
may create preliminary limits and determine which
In this paper, we study Bayesian methods of ad-
points, if any, were out of control during the period
dressing these questions. The Bayesian method sub-
of data collection. For this situation, we would not
sumes all three of these issues into a single statisti-
know the values of the parameters before or after the
cal model. Also, the posterior distributions (of the
shift, and we would not know whether or not there
change point, for example) can convey information
was a change point. In Phase II, we usually assume
that the parameters are known before the shift, but,
Dr. Pan is Associate Professor of Industrial Engineering in when a shift occurs, it occurs at an unknown time,
the School of Computing, Informatics and Decision Systems with variables (components of the observed vector of
Engineering. He is a Senior Member of ASQ. His email address quality characteristics) that are unknown, and the
is rong.pan@asu.edu. new values after the shift are unknown. We are as-
Dr. Rigdon is Distinguished Research Professor, Depart- suming in this paper a Phase II analysis that, when
ment of Mathematics and Statistics. He is a Senior Member an out-of-control signal is raised, looks back on the
of ASQ. His email address is srigdon@siue.edu. past data to address the three questions posed above.
Vol. 44, No. 3, July 2012 231 www.asq.org

232 RONG RAN AND STEVEN E. RIGDON
Multivariate processes are of interest here, where problem. A Bayesian analysis can provide full poste-
we assume that process variables follow a multivari- rior distributions for these quantities, giving informa-
ate normal distribution. We assume that only a mean tion that is not available in a likelihood analysis. For
change is possible and that the covariance matrix the likelihood-based approach to change point esti-
of process variables is fixed (this constraint can be mation, we refer the reader to Sullivan and Woodall
easily relaxed as we will see in an example later). (2000), Zamba and Hawkins (2009), and their related
The mean change is a shift in its level. For example, references.
when there are two process variables that are simul-
taneously monitored, we have three possible types of Bayesian Method
change: for a Univariate Process
1. the first process variable’s mean has shifted; Study of the Bayesian approach to the change
2. the second process variable’s mean has shifted; point problem can be traced back to Barnard (1959)
and Cherno↵ and Zacks (1964), who mentioned that
3. both variables’ means have shifted.
their study was motivated by a “tracking” problem,
Therefore, besides identifying when the shift hap- where occasional changes in the direction of a path
pened, we also need to identify which change model were of interest. A decade later, Smith (1975) gave a
occurred. Specifically, the three models are more comprehensive treatment of Bayesian inference
about a single change point in a univariate process.
Both the binomial distribution and normal distribu-
Model 1
tion were discussed in Smith (1975). Here we will
✓ ◆ briefly summarize the result for the normal distribu-
µ11
X1 , X2 , . . . , Xk ⇠ N ,⌃ tion.
µ21
and Consider a time series from independent normal
✓ ◆ distributions, X1 , X2 , . . . , Xk ⇠ N (µ1 , 12 ) and Xk+1 ,
µ12
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃ Xk+2 , . . . , Xn ⇠ N (µ2 , 22 ). The main task is then to
µ21
find the posterior distribution of k, the time of pro-
cess mean change. After this change point, the pro-
Model 2 cess mean will shift to a new, unknown level, and we
✓ ◆ would like to estimate this value. We assume that
µ11 there is a single process change, which occurs after
X1 , X2 , . . . , Xk ⇠ N ,⌃
µ21 time k, although our methods could be applied re-
and peatedly to the before- and after-shift data. For ex-
✓ ◆ ample, if a signal is raised at time 40 and the point
µ11
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃ estimate of k is 20, we could apply the methods sep-
µ22
arately to data values 1 through 20, and 21 through
40. Such an approach could detect multiple change
Model 3
points as well as di↵erent sets of variables changing
✓ ◆ at di↵erent times.
µ11
X1 , X2 , . . . , Xk ⇠ N ,⌃
µ21 There are numerous models that could be con-
and sidered based on what is assumed known and what
✓ ◆ is unknown. For example, we could assume that µ1
µ12 and 12 are known, whereas µ2 , 22 , and k are un-
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃
µ22 known. We could also assume that 22 is known, such
that 22 = 12 , leaving µ2 and k as the only unknown
In general, when there are p process variables, quantities. This is the assumption we make in the re-
there exist 2p 1 mean shift models. This num- mainder of this section, but other combinations are
ber becomes very large even for a modest value of certainly possible.
p, so eliciting all possible shift models is trouble-
some. Therefore, model selection presents a major To facilitate the analysis, we use a precision pa-
challenge in the multivariate change point problem. rameter ⌧ , which is the reciprocal of variance (⌧ =
In this paper we discuss a Bayesian approach to this 1/ 2 ). Then, assuming that the before-shift and
Journal of Quality Technology Vol. 44, No. 3, July 2012

A BAYESIAN APPROACH TO CHANGE POINT ESTIMATION IN MULTIVARIATE SPC 233
"
after-shift variances are equal (i.e., 12 = 22 = 2 , ⌧ (n k+ )
or equivalently ⌧1 = ⌧2 = ⌧ ), the likelihood functions ⇥ exp
2
for the data before and after the change are
✓ ◆#
L(x1 , . . . , xk ; µ1 , ⌧, k) ↵ + (n k)x̄0
⇥ µ22 2 µ2 ,
Yk r h ⌧ i n k+
⌧
= exp (xi µ1 )2 , Pn
i=1
2⇡ 2 where x̄0 = i=k+1 xi /(n k) is the average of ob-
servations after the mean shift.
and
Integrating out µ2 , we find that the marginal pos-
L(xk+1 , . . . , xn ; µ2 , ⌧, k)
r terior density of the change point k is
Y n
⌧ h ⌧ i
= exp (xi µ2 )2 . p(k | x1 , x2 , . . . , xn )
2⇡ 2
i=k+1
1
/p
The total likelihood function is therefore n k+
"
L(x1 , . . . , xn ; µ1 , µ2 , ⌧, k) ⌧ X
k
⇣ ⌧ ⌘n/2 ⇥ exp (xi µ1 )2
= 2 i=1
2⇡ !#
" k n
!# n
X ( ↵ + (n k)x̄0 )2
⌧ X X + x2i .
⇥ exp (xi µ1 )2 + (xi µ2 )2 . n k+
2 i=1 i=k+1
i=k+1
Inference for k is then based on this posterior distri-
For µ2 (which together with k are the only unknowns
bution. A point estimate of the change point is the
under our assumptions), we use a conjugate prior, the
posterior mode, that is, the value of k that maximizes
normal distribution N (↵, ( ⌧ ) 1 ), which has proba-
p(k | x1 , x2 , . . . , xn ).
bility density function (PDF)
r  If we assume a noninformative prior for µ2 by set-
⌧ ⌧
p0 (µ2 ) = exp (µ2 ↵) .
2
ting its precision hyperparameter equal to zero (im-
2⇡ 2 plying an infinite prior variance), then the posterior
Here ↵ and are considered to be known and reflect of the change point is
the kind of shift that can be expected. For example,
if a process is operating with a mean of 0.5 and stan- p(k | x1 , x2 , . . . , xn )
dard deviation 0.02, a shift to 0.6 might be as large 1
/p
as can be expected. In this case we would want 0.6 n k
" !#
to be near the upper tail of the prior distribution; we ⌧ X
k n
X
could then choose ↵ and accordingly. ⇥ exp (xi µ1 ) +
2
(xi x̄ )
0 2
.
2 i=1
i=k+1
Given that a change has occurred, a noninforma-
tive prior for k is the uniform distribution This is proportional to the likelihood function, so
the maximum likelihood estimate and the posterior
1
p0 (k) = , k = 1, 2, . . . , n 1. mode are the same in this case.
n 1
Let the prior distributions of k and µ2 be indepen- This fully Bayesian parametric approach to the
dent. change point problem was expanded by using hier-
archical Bayes (HB) model in Carlin et al. (1992).
Based on the Bayes theorem, the posterior distri- The HB model provided more flexibility in defining
bution of (µ2 , k) is (the derivation is given in Ap- the prior distributions of model parameters such as
pendix A) the before- and after-shift process means. The poste-
p(µ2 , k | x1 , x2 , . . . , xn ) rior densities of the parameters of interest in Carlin
et al. (1992) were obtained through Markov chain
/ L(x1 , . . . , xn ; µ1 , µ2 , ⌧, k)p0 (µ2 )p0 (k)
" !# Monte Carlo (MCMC) simulations. These authors
k n
⌧ X X also showed ways of applying their method on sta-
/ exp (xi µ1 ) +2
xi2
2 i=1 tionary, but autocorrelated, processes and on regres-
i=k+1
sion models.
Vol. 44, No. 3, July 2012 www.asq.org

Multivariate Process Change Model 3

Point Model Estimation ✓ ◆
µ1
Bivariate Case: Before-Shift Parameters Xk+1 , Xk+2 , . . . , Xn ⇠ N ,I
µ2
Known
with prior probability ⇡3 .
Before we address the general multivariate case,
we begin with the case where there are just two pro- It is clear that the first two models can be directly re-
cess variables. By assuming that the before-shift pa- duced to the univariate case described in the previous
rameters are known, we are essentially in Phase II section. Again, we may assume that the after-shift
statistical process control (SPC) where we monitor mean has a normal prior, N (↵, 1 ), so the poste-
the process for changes in the mean vector. We as- rior distribution of the change point k is given by
sume that the bivariate process is monitored using p(k | x1 , x2 , . . . , xn )
some standard bivariate chart, such as the T 2 chart r
1
or the MEWMA chart (Lowry et al. (1992)). Once /
+n k
a signal is raised in this Phase II analysis, we would " n
like to look back (i.e., retrospectively) to determine 1 X 2
⇥ exp x
when the change occurred, which variables changed, 2 i=1 i1
and the new means for those variables that shifted. !2
n
X
The process mean shift may occur in one of the three 1
xi1 + ↵
cases mentioned previously. The prior probabilities +n k
i=k+1
are !#
n
X
⇡1 = P (Model 1) + x2i2 ,
=P (only the first mean changed) i=1
⇡2 = P (Model 2) for Model 1, while for Model 2, the variables xi1 and
xi2 are interchanged in the above formula. For Model
=P (only the second mean changed)
3, if we assume that the after-shift mean has the
⇡3 = P (Model 3) bivariate normal prior
=P (both means changed) ✓  1 ◆
↵1 0
N , 1 ,
where ⇡1 +⇡2 +⇡3 = 1. Assuming no prior knowledge ↵2 0 2
1
of the mean change model, we would take ⇡1 = ⇡2 =

then the marginal posterior of k is
⇡3 = 1/3.
p(k | x1 , . . . , xn )
We assume for now that the parameters before the s
change are known, and without loss of generality we 1
/
assume ( 1 + n k)( 2 + n k)
✓ ◆ 2 0
0
X1 , X2 , . . . , Xk ⇠ N ,I . 1
n
X 1
0 ⇥ exp 4 @ x2
2 i=1 i1 1+n k
After the mean shift, one of the following models
!2
holds: n
X
⇥ xi1 + 1 ↵1
Model 1 i=k+1
✓ ◆ Xn
1
Xk+1 , Xk+2 , . . . , Xn ⇠ N
µ1
,I + x2i2
0 i=1
k 2 + n
! 13
with prior probability ⇡1 ; n
X
2
⇥ xi2 + 2 ↵22 A5 .
Model 2 i=k+1
✓ ◆
0 Although the posterior distribution of the change
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,I
µ2 point can be derived in this bivariate process case, we
with prior probability ⇡2 ; have not solved the problem of model selection, i.e.,

among these three models, which one should be cho-

sen? Model 3 can be viewed as a super model, since
it contains Models 1 and 2 as special cases. There-
fore, it tends to be better than the other two. This
is an undesirable consequence of the nested model
structure. In addition, when the before-shift mean or
the process covariance matrix is unknown, or when
the prior distribution of the mean is not a normal
distribution, the posterior cannot be easily derived
and MCMC methods are necessary. We will address FIGURE 1. Notation for the mean vector.
these problems in the next section.
Bivariate Case: Before-Shift Parameters Posterior Distributions

Unknown Using MCMC
If we assume that all parameters are unknown and A change point model is complex enough that
that the variance is unchanged at the time of the even with some modest assumptions the Bayesian
mean shift, then, for the bivariate case, the three solution is intractable. Without MCMC, the bivari-
models are ate case described above requires that the process
mean and variance are known, the prior distribution
Model 1
of the after-shift mean must be a conjugate prior,
✓ ◆ and the mean change model is known at the outset.
µ111
X1 , X2 , . . . , Xk1 ⇠ N ,⌃ These are unrealistic assumptions. By using MCMC,
µ112
a complicated shift model can be easily implemented,
and the choice of prior distribution is more flexible, and
✓ ◆
µ121 we can investigate the model selection along with
Xk1 +1 , Xk1 +2 , . . . , Xn ⇠ N ,⌃
µ112 the model parameter estimation. The computational
power required by an MCMC algorithm is no longer
Model 2 an issue with fast computers today, as long as the di-
✓ ◆ mension stays at about 10 or less. Moreover, general-
µ211 purpose Bayesian software, such as WinBUGS, is
X1 , X2 , . . . , Xk2 ⇠ N ,⌃
µ212 available for building models and executing MCMC
and algorithms quickly.
✓ ◆
µ211
Xk2 +1 , Xk2 +2 , . . . , Xn ⇠ N ,⌃ However, because the number of model parame-
µ222
ters may change from model to model, it causes the
Markov chain convergence conditions not to be sat-
Model 3
isfied. For example, in the bivariate process mean
✓ ◆ shift problem, a single mean shift model will have
µ311
X1 , X2 , . . . , Xk3 ⇠ N ,⌃ three mean variables, but a double mean shift model
µ312
will have four mean variables. This problem was
and thoroughly addressed in Carlin and Chib (1995).
✓ ◆
µ321 These authors proposed using a pseudoprior method.
Xk3 +1 , Xk3 +2 , . . . , Xn ⇠ N ,⌃
µ322 Suppose there are a total of m models under con-
sideration, and each of them has its own param-
Here, µijm is the process mean for model i (i = eter set. A “super-parameter set” is then created
1, 2, . . . , 2p 1), shift identifier j (j = 1 if before by combining all models’ parameters. Let variable
the shift, j = 2 after), and variable component m M (M = j, j = 1, . . . , m) indicate an individual
(m = 1, 2, . . . , p). See Figure 1 for an illustration of model. In addition, the prior distribution of a sub-
this notation. We must consider a large set of pa- parameter set ✓j , which corresponds to model j, is
rameters, because the posterior distribution of the defined according to the model indicator M . That
change point will be di↵erent if Model 1 holds than is, p0 (✓j | M = j) is the typical prior as defined in
if Model 2 holds. This case also requires the use of a usual Bayesian analysis, but p0 (✓j | M 6= j) is a
MCMC, which is described in the next section. pseudoprior, which is used for facilitating the Markov

chain iteration only. The full conditional distribution tial values, then the mixing of simulated chains and
of ✓j is given by the comparison of variance of individual chain to the
total variance of the mixture of chains are used to
p(✓j | ✓i6=j , M, x)
⇢ make a decision on convergence. Both methods are
L(x; ✓j , M = j)p0 (✓j | M = j) when M = j available in the CODA library (Best et al. (1995))
/
p0 (✓j | M 6= j) when M 6= j. in R.
This formula says that observations will not a↵ect
the inference of ✓j if model j is not selected. The Model Selection When There Are
posterior of M is given by More Than Two Process Variables
p(M = j | ✓, x) For a bivariate process, when we are to identify
m
Y its mean change there are three change models as de-
L(x; ✓j , M = j) p0 (✓i | M = j)⇡j scribed in the previous section. In general, the pro-
= i=1
. cess’s mean vector and variance-covariance matrix
m
X m
Y are unknown. To apply MCMC to a bivariate pro-
L(x; ✓j , M = k) p0 (✓i | M = k)⇡k cess, there are 17 model parameters that need to be
k=1 i=1
sampled at each sampling iteration. These parame-
Note that using this pseudoprior method, our main ters include one model indicator, three change point
purpose is to select the most probable model. The variables (one for each of the three mean shift mod-
posterior p(✓j | ✓i6=j , M, x) is not of interest; instead, els), three variance-covariance components, and 10
the conditional posterior p(✓j | ✓i6=j , M = j, x) is the process means (three means, including before- and
correct one for parameter estimation. after-shift, for Model 1 and for Model 2, and four
means for Model 3). For a trivariate process, there
For example, in the bivariate case where the pa-
is one model indicator (with 23 1 = 7 possible val-
rameters from before and after the shift are unknown,
ues), 33 means, 7 change point variables (one for each
the super-parameter set is then
h possible model), 3 variances, and 3 covariances. For
✓ = m; k1 , k2 , k3 ; µ111 , µ112 , µ121 ; µ211 , µ212 , a p-variate process, the number of means is
i0 ✓ ◆ ✓ ◆ ✓ ◆
µ222 ; µ311 , µ312 , µ321 , µ322 , 11
2 2
, 22 , 12 . p p p
(p + 1) + (p + 2) + · · · + (p + p)
1 2 p
When Model 1 is chosen, for example, the sub-
= 3p2p 1
p.
parameter set is
⇥ ⇤0
✓1 = k1 , µ111 , µ112 , µ121 , 11
2 2
, 22 , 12 . Thus, for a p-variate process, there is one model indi-
At each stage of the MCMC the model updates all cator (with 2p 1 possible values), 3p2p 1 p mean
parameters in the super set, even those parameters parameters,
✓ ◆ 2
p 1
change point variables, p variances,
p
that are not part of the current model. and covariances. Since this “curse of dimen-
2
Theoretically the MCMC samples will converge sionality” problem prevents the direct application of
to the targeted joint distribution asymptotically. In MCMC on the multivariate change point problem
practice, we need to check the convergence of the with more than 2 or 3 process variables, we pro-
MCMC chain so as to determine an appropriate sam- pose a branch-and-bound algorithm to handle the
ple size. The accuracy of the MCMC algorithm in change point problem on a high-dimensional multi-
estimating posterior moments has been discussed in variate process. In essence, we will conduct a series
Geweke (1992). We use two popular diagnostic meth- of model comparisons, while in each comparison only
ods adopted in literature: R-L, proposed by Raftery two models are evaluated. Due to the hierarchical
and Lewis (1996), and G-R, proposed by Gelman and structure of multivariate mean-shift models, we can
Rubin (1992). The R-L method is based on monitor- start from the model where all means have shifted at
ing the autocorrelation within one single chain. It a change point and compare it with a model where
provides the theoretical minimum number of itera- all means except one have shifted. The all-means-
tions of an MCMC chain and suggests the number of change model is at the top of the hierarchy, and
burn-in iterations and the number of total iterations. these all-means-but-one-change models are the direct
The G-R method requires performing multiple sim- sub-models of the all-means-change model. Using the
ulations, starting from di↵erent (over-dispersed) ini- Bayes factor as a criterion, if the all-means-change

TABLE 1. Mean-Change Models and Their Codes for a 4-Dimensional Process
Model 1 [1 0 0 0] The first process mean changed.

Model 2 [0 1 0 0] The second process mean changed.
Model 3 [0 0 1 0] The third process mean changed.
Model 4 [0 0 0 1] The fourth process mean changed.
Model 5 [1 1 0 0] The first and second process means changed.
Model 6 [1 0 1 0] The first and third process means changed.
Model 7 [1 0 0 1] The first and fourth process means changed.
Model 8 [0 1 1 0] The second and third process means changed.
Model 9 [0 1 0 1] The second and fourth process means changed.
Model 10 [0 0 1 1] The third and fourth process means changed.
Model 11 [1 1 1 0] The first, second and third process means changed.
Model 12 [1 1 0 1] The first, second and fourth process means changed.
Model 13 [1 0 1 1] The first, third and fourth process means changed.
Model 14 [0 1 1 1] The second, third and fourth process means changed.
Model 15 [1 1 1 1] All means changed.
model is more plausible than an all-means-but-one- models. This process will continue until we find a
change model, then we will eliminate the consider- model all of whose sub-models are inferior.
ation of the latter model and continue the compar-
We code the multivariate change point model as a
ison of the all-means-change model to another all-
series of 0-1 codes, where 1 indicates a mean change
means-but-one-change model. When there is one all-
and 0 is no change. For example, a 4-dimensional
means-but-one-change model that is better than the
multivariate process has the change point models as
all-means-change model, the no-change mean will be
shown in Table 1.
fixed, and we will continue to try out other means.
We compare this all-means-but-one-change model to If the true change model is Model 6, [1 0 1 0],
an all-means-but-two-change model with the previ- then the search path starting from the top of the
ous no-change mean being fixed, i.e., its direct sub- model hierarchy is illustrated in Figure 2. In this ex-
FIGURE 2. Search Path to Model 6. The dash links will not be searched in finding the right model.

TABLE 2. Number of Model Parameters for All Possible Pairwise Comparisons
Model
Comparison indicator Means Variances Covariances Total
4 means change vs. 3 means change 1 15 4 6 26

3 means change vs. 2 means change 1 13 4 6 24
2 means change vs. 1 mean changes 1 11 4 6 22
ample, we use 5 comparisons of two models to find Implementation in WinBUGS

the
✓ ◆ right model, which is reduced from a total of
15 Example 1: A Real Case
= 105 pairwise comparisons. The numbers of
2 This example comes from Holmes and Mergen
parameters required for each possible model com- (1993), where 56 individual bivariate observations
parison is shown in Table 2. In our implementation, from an European plant producing grit or gravel
a personal computer runs out of memory when the were collected. There are two variables: the percent-
MCMC is executed on the one-step 15-model selec- age of large (by weight) particles and the percent-
tion process. The direct application of MCMC on a age of medium particles. These two variables are cor-
multiple model selection process is also theoretically related in general. Sullivan and Woodall (2000) de-
troublesome because a large amount of no-use full signed a control chart and detected both a mean vec-
conditional posteriors, due to pseudopriors, must be tor change and a variance-covariance matrix change
kept in the Markov chain. on this process.
We first treat the process as two independent pro-
The specification of the prior distribution and hy-
cesses and standardize their observations. The plot of
perprior parameters can a↵ect the MCMC results,
these two standardized processes in Figure 3 clearly
and this seems to be the case for multivariate change
shows the mean shift at around the 25th observa-
point problems. Berger and Pericchi (1996) proposed
tion. The change of correlation can be seen from the
using intrinsic Bayes factors for model selection. The
scatter plot in Figure 4. In the following, we use the
basic idea is to use part of the data as a training
Bayesian approach and MCMC to estimate simul-
sample to find the proper prior. This is an analogue
taneously the change model, change point, and the
of the cross-validation method used in many data-
before- and after-change mean vectors and variance-
mining algorithms. Using intrinsic priors, Moreno et
covariance matrices.
al. (2005) provided both on-line and o↵-line change-
point analysis for univariate process, and a bivariate In our analysis, we assume that a single change
normal process was discussed in Son and Kim (2005). has occurred on this bivariate process and that both
FIGURE 3. Time Series Plot of the Standardized Bivariate Process.

FIGURE 4. Scatter Plot of the Bivariate Process. FIGURE 5. The Shrink Factors of After-Shift Process
Means of Model 3.
mean vector and covariance matrix have shifted af-
ter the change. We do not know the true values of the burn-in period and summarize the posterior dis-
the mean vector or covariance matrix either before or tribution of the parameters of interest based on the
after the change. Let the prior distribution of mean second half (5000) samples. The result shows that the
vector (either before- or after-shift) be bivariate nor- third model, both means shifted, is the most plausi-
mal distribution with mean [0 0]0 . The prior distri- ble model and, the most likely change point is at the
butions of the covariance matrices are defined sepa- 24th observation, although change points around 40
rately. We use a gamma prior for each component in are also possible (see Figure 6). It could also sug-
the precision matrix (inverse of the covariance ma- gest that there are two change points. This bimodal
trix), and the after-shift precision has its prior being feature of the posterior distribution can convey in-
the before-shift prior multiplied by a sizing factor. formation that is not available through a likelihood
This factor has a uniform prior distribution centered analysis. This is because the likelihood function gives
at 1. In WinBUGS the relationships of model pa- the probability distribution of getting the observed
rameters, hyperparameters, and observations can be data given fixed values of the parameters, whereas
represented by a probabilistic graphical model, which the posterior gives the probability of each possible
is given in Appendix B. After 10,000 MCMC sam- change point given the observed data, which has a
ples, we find that there is enough model mixing from much more natural interpretation. See Apley (2012)
the trace plot of model indicator M . To check con- for a further discussion of multi-modal posterior dis-
vergence of other parameters, we start two chains tributions.
with di↵erent initial values and apply the R-L and
G-R methods. The diagnosis of these chains indicates Using the posterior medians, we estimate that the
that 5000 burn-in samples are adequate. The depen- before-shift mean vector and the after-shift mean
dence factors for the R-L method are always lower vector are
than 3 (less than 5 is considered acceptable). The G-  
0.3083 0.3852
R method calculates the shrink factor (the ratio of µ3,1 = and µ3,2 = ,
0.1983 0.1913
within and between chain sample variances) for vary-
ing sample sizes. As one can see from Figure 5, after respectively. The before-shift covariance matrix is es-
5000 burn-ins the median shrink factor is close to 1, timated as
which indicates the MCMC chain has converged to a 
stationary process. ˆ 3,1 = 0.87705 0.6623
⌃ ,
0.6623 0.8925
We treat the first half of the MCMC samples as

FIGURE 6. Model Selection and Change Point Identification from MCMC.
and the after-shift covariance matrix as We evaluate the performance of Bayesian and
 MLE approaches on processes with shift sizes of
ˆ 0.9914 0.7729 0.5, 1, and 2 standard deviations. Whenever the
⌃3,2 = .
0.7729 0.9997 MEWMA chart signals a process change, both
Bayesian and likelihood procedures for change point
analysis are applied. The likelihood method can only
Example 2: A Simulation Study estimate the change point for a given change model
To evaluate the performance of the Bayesian ap- (assuming we know the true change model), but the
proach and to compare it with other methods, such Bayesian method is able to identify simultaneously
as maximum likelihood estimation (MLE), we im- the change model, change point, and mean shift size;
plemented a simulation study with 1000 randomly therefore, we will compare the performance of these
generated bivariate processes. The mean of each vari- two methods in terms of the change point location
able is 0, and the variance is 1. The two variables are estimation only. As the control chart can signal a
correlated, and the covariance is 0.5. The simulated process change before the true mean shift happens,
process has a mean shift on its first component after we discard such processes and analyze only the pro-
the 20th observation. The simulated bivariate processes having more than 20 observations.
cess was monitored by the MEWMA chart with the Table 3 provides the performance measures of the
average in-control run length being 200 (Lowry et al. Bayesian method on choosing the correct change
(1992)). The plot of one simulated process and the model for di↵erent shift sizes. One can see that, as
MEWMA chart are shown in Figures 7 and 8. expected, the probability of correctly identifying the
FIGURE 7. Time Series Plot for One Simulated Bivariate FIGURE 8. MEWMA Chart for Simulated Bivariate Pro-
Process. cess.

TABLE 3. Out of 1000 Simulations, the Proportion of Time the Correct Model Was Selected
and Its Standard Error in Parenthesis
Most likely model Shift size of 0.5 Shift size of 1 Shift size of 2
Model 1 (mean shift on variable 1) 0.870 (0.011) 0.897 (0.010) 0.953 (0.007)
Model 2 (mean shift on variable 2) 0.105 (0.010) 0.090 (0.009) 0.035 (0.006)
Model 3 (mean shift on both variables) 0.025 (0.005) 0.013 (0.004) 0.012 (0.003)
Note: The correct model is Model 1 (mean shift on variable 1 only).
right model (Model 1) increases when the shift size To compare the performance of change point iden-
increases. However, even when the shift size is rel- tification of Bayesian and MLE methods, we present
atively small (0.5 standard deviation) the chance of the proportion of times that the true change point
identifying the right model is still high (>85%). Our (20) is in one of the five top locations identified by
previous concern of the negative e↵ect of the nested either method and the proportion of times that the
structure of change models does not occur in the most likely change point location is within 5 time
simulation study, as the chance of choosing Model units of the true change point location (Tables 4
3 is much smaller than any other models, which in- and 5, respectively). In the Bayesian approach we
dicates that the Bayesian method does not exhibit a use a noninformative uniform prior for the change
tendency toward the complex model when the pro- point location parameter. For example, if there are
cess’s dimension is low. Meanwhile, we examined sev- 30 observations, then each one, from the first ob-
eral simulated processes where change models were servation to the next to last observation, will have a
wrongly chosen and find that they are so noisy that 1/29 prior probability as a change point (i.e., the last
the selected change model appears to be more reason- point before the change in mean level). As one can
able than the true change model. In other words, the see from these tables, the Bayesian and MLE meth-
control chart signals a process change not because of ods are similar in their ability to identify the true
the mean shift on the first variable, but because some change point. Both methods are good at finding the
large random error happened on the other variable. true change point when the shift size is 1 standard
TABLE 4. Out of 1000 Simulations When the Change Model Was Correctly Identified, the Proportion of Times
That the True Change Point is One of the Five Most Likely Values, as Measured by the Posterior
Distribution or the Likelihood Function, and Its Standard Error in Parenthesis
Method Shift size of 0.5 Shift size of 1 Shift size of 2
Bayesian 0.384 (0.016) 0.774 (0.014) 0.966 (0.006)

MLE 0.389 (0.017) 0.759 (0.014) 0.961 (0.006)
TABLE 5. Out of 1000 Simulations, When the Change Model Was Correctly Identified, the Proportion of Times
That the Most Likely Change Point Is Within Five Time Units of the True Change Point Value,
and Its Standard Error in Parenthesis
Method Shift size of 0.5 Shift size of 1 Shift size of 2
Bayesian 0.454 (0.017) 0.829 (0.013) 0.987 (0.004)

MLE 0.476 (0.017) 0.834 (0.014) 0.982 (0.006)

TABLE 6. Out of 1000 Simulations, When the Change Model Was Correctly Identified, the Proportion of Times
of Correctly Identifying True Change Point by the Bayesian and MLE Methods,
When Some Proper Prior Knowledge of Change Point Was Applied
Likelihood of identifying the Likelihood of identifying the

true change point in true change point within the
Method the top 5 likely points 5 units of the most likely point
Bayesian 0.845 (0.012) 0.905 (0.010)

MLE 0.764 (0.014) 0.842 (0.012)
deviation or larger, but both fail when the shift size means have shifted, and values of the new means)
is only 0.5 standard deviation, simply because the are handled within a single model and framework. If
process is too noisy. the specification limits are known, then the Bayesian
method can also provide an estimate (along with its
One of the advantages of the Bayesian approach
uncertainty) of the fraction nonconforming after the
is that it is easy to incorporate prior process knowl-
shift.
edge into change point estimation. For example, for
the simulated process we assume that the first 15 ob- Example 3: High-Dimensional Multivariate
servations are unlikely to be change points; then we Processes
may let their prior density be 0 and assign other ob-
servations to a uniform distribution. This assumption The following test problem is to demonstrate
is reasonable because the simulated process runs at the pairwise model comparison and the branch-and-
least 20 observations and the control chart signals as bound search algorithm. The process has four vari-
soon as it detects a mean shift. Table 6 provides the ables, and there are 60 observations in total, assum-
results from the Bayesian approach under such a sce- ing a change point happens after the 30th observa-
nario when the shift size is 1 standard deviation, and tion and the first and third process mean shifted by
the results are compared with the MLE approach. It 0.5 . In this example, we use independent process
is clear that the prior knowledge helps in correctly variables. The process observations are depicted in
identifying the true change point, and the Bayesian Figure 9.
approach outperforms the MLE approach. The ma-
jor advantage, however, of the Bayesian approach is We implemented the pairwise comparison method,
that all three problems (time of change point, which starting from the topmost hierarchy (i.e., all four
FIGURE 9. A Four-Variate Process Where Variables 1 and 3 Shifted Upward by 0.5 at Time 30.

FIGURE 10. Search Path for the 4-Variate Process Ex-

ample.
means changed) and used the criterion (i.e., Bayes

factor less than 1) to progress the model search. The
actual searching path is depicted in Figure 10. The FIGURE 11. Pairwise Comparison of Models.
Bayes factor is the odds ratio of the two models. For
example, B15,13 < 1, so the second model, Model 13,
is chosen and will be compared with its sub-models in for all possible change models directly, because there
the next step. The search algorithm is programmed are too many parameters and hyperparameters to
in R, and the R code calls WinBUGS to perform be sampled. Using the branch-and-bound search al-
MCMC. gorithm we are able to carry out the MCMC effi-
ciently, as there are only two models to be compared
The process has missed the first chance of find-
each time. However, multiple pairwise comparisons
ing the optimal model, because Model 15, [1111], is
are required to find the final model. Overall, the com-
slightly better than Model 11, [1110]. This error is
putation time of our approach will depend on the
purely due to the data randomness and sampling ran-
following items:
domness in MCMC. However, it still correctly finds
the optimum by going through Model 13, [1011]. It
• the number of model parameters and hyperpa-
shows that the branch-and-bound method is quite ro-
rameters, and the number of observations;
bust, although the total number of comparisons is 6,
instead of 5 as expected. A closer examination of the • the length of MCMC chains;
process data shows that, at the middle of the process,
• the number of pairwise comparisons.
the fourth process variable seems to have a very small
shift. This is why the all-means-change model is pre-
We note that these items are not independent of one
ferred over the all-means-but-the-fourth model. The
another. For a complex model with many parameters
model comparison results from WinBUGS are shown
and hyperparameters, a longer MCMC chain is typ-
in Figures 10 and 11. For Model 6, one more MCMC
ically required to reach convergence, and, when the
was run to locate the change point. It is found that
process has many variables, the number of pairwise
the most likely change point is at 32, which is close
comparisons will also increase. To study the com-
to the true change point of 30 (see Figure 12).
putation requirement, we simulated a process with
Because the MCMC technique is computationally varying dimension from 4 to 10; 30 observations were
intensive, it is necessary to investigate the compu- used, and a mean shift of 1 standard deviation hap-
tational requirement of the Bayesian approach. We pened at the first and the third variables after the
mentioned earlier that even for a process with four 20th observation. In a preliminary study, we find that
process variables it is difficult to execute an MCMC 10,000 MCMC samples are adequate to reach con-

sands are needed in order to estimate the param-

eters with sufficiently high precision. (See Champ
et al. (2005)). If p is very large, it can be benefi-
cial to attempt to reduce the number of variables
monitored, either through an engineering analysis of
which process variables are really critical or through
the variable selection multivariate statistical process
control (VS-MSPC) chart suggested in Wang and
FIGURE 12. The Posterior Distribution of Change Point Jiang (2009) or the variable selection chart based on
of Model 6 from MCMC. the multivariate exponentially weighted moving av-
erage (VS-MEWMA chart) suggested in Jiang et al.
(2012). Here MSPC stands for multivariate statisti-
vergence for the 10-dimensional process model; thus, cal process control. The performance of fault diagno-
we treat the first 5000 samples as burn-ins and use sis of these charts is not satisfactory when shift size
the last 5000 samples to estimate posterior distribu- is small to medium. For example, it was reported in
tions. The computer time (on an Intel Core i7 CPU Wang and Jiang (2009) that the VS-MSPC chart can
3.07GHz) and the search path are listed in Table 7. correctly identify the variables of shifted means only
While we acknowledge that our method becomes 29.0% of the time when the shift size is 0.5 on a 10-
impractical when the process dimension grows to variate process and two shifted variables, and 54.8%
more than about 10, we should note that other, of the time when the shift size is 1 . Another possi-
more traditional methods, such as the T 2 chart or bility for large p is to use a variable selection method,
the MEWMA chart, require very large (often pro- not to find the most likely variables that shifted, but
hibitively large) preliminary samples in order to es- to eliminate some variables from consideration and
timate the process parameters well enough to run the use the Bayesian method on the remaining variables.
multivariate control chart in Phase II. For example, For example, if 5 variables from a 15-variable process
when p = 12, there are 90 parameters to estimate: can be eliminated (that is, we are confident that these
✓ ◆
12 variables did not shift), then the Bayesian methods
12 means, 12 variances, and = 66 covariances. described here can be applied to the remaining 10
2
Preliminary sample sizes in the hundreds or thou- variables.
TABLE 7. Computation Requirement of Change Point Estimation on a Multivariate Process
4 variables 6 variables 8 variables 10 variables
Time of one pairwise 2.19 4.72 8.18 12.61

comparison (min)
Number of 5 7 9 11
comparisons
Total computing time 8.26 23.34 50.68 94.80
(min)
Search path [1111]<[1110] [111111]<[111110] [11111111]<[11111110] [1111111111]<[1111111110]
[1110]>[1100] [111110]<[111100] [11111110]<[11111100] [1111111110]<[1111111100]
[1110]<[1010] [111100]<[111000] [11111100]<[11111000] [1111111100]<[1111111000]
[1010]>[1000] [111000]>[110000] [11111000]<[11110000] [1111111000]<[1111110000]
[1010]>[0010] [111000]<[101000] [11110000]<[11100000] [1111110000]<[1111100000]
[101000]>[100000] [11100000]>[11000000] [1111100000]<[1111000000]
[101000]>[010000] [11100000]<[10100000] [1111000000]<[1110000000]
[10100000]>[10000000] [1110000000]>[1100000000]
[10100000]>[01000000] [1110000000]<[1010000000]
[1010000000]>[1000000000]
[1010000000]>[0100000000]

TABLE 8. Simulations Result of Pre-Diagnosis of point analysis. Table 8 provides the result of a simu-
Shifted Variables Using S 2i Statistic lation study of this pre-diagnosis method, where 1000
processes are simulated and the first two variables
Number of Number of Percentage of Removing are shifted to 1 after 20 observations. An MEWMA
Process Variables Any Shifted Variable control chart with = 0.1 is applied such that the
Variables Removed (standard error) asymptotic in-control average run length (ARL) is
200. When the chart signals an alarm, the method
above is applied on the EWMA statistics of all vari-
15 1 0.3 % (0.17 %)
ables to remove the least likely shifted variables.
15 2 1.0 % (0.31 %)
From the table, we can see that the percentage of the
15 3 2.2 % (0.46 %)
removed variables that have not shifted is small. In
15 4 2.4 % (0.48 %)
summary, this pre-diagnosis method can be applied
15 5 3.5 % (0.58 %)
after a control chart signals an alarm to remove those
variables for which the evidence of a shift is least; we
20 2 0.7 % (0.26 %) could then apply the Bayesian change point analysis
20 4 1.0 % (0.31 %) on the reduced set of process variables.
20 6 2.3 % (0.47 %)
20 8 3.4 % (0.57 %) Summary and Conclusions
20 10 5.7 % (0.73 %)
We have shown how a Bayesian approach can be
used to select among possible change point models
for multivariate SPC. These models are very flexible
The method of Jiang et al. (2012) involves solving and can incorporate various assumptions, including
the constrained optimization problem whether the parameters before or after the shift are
known, and whether the before and after variances
min (wt µt )T ⌃ 1
(wt µt ) are constant. This Bayesian analysis can be accom-
µt
plished by addressing three related questions:
subject to
X 1. Which variables have shifted?
I(|µi(j) | 6= 0)  s,
j 2. When did the shift occur?
where wt is the exponentially weighted moving av- 3. What are the values of the parameters after the
erage (EMWA) vector at the time of control chart shift?
alarm; the constraint is to limit the number of po- The software WinBUGS, used in conjunction with
tential variables with shifts to be equal to s. The R, can be used to perform the required Monte Carlo
objective is basically to minimize the weighted mean simulations. The Bayesian analysis has three main
square error of the EWMA statistics. The solution to assets:
this optimization problem is to set the means of those
shifted variables to their EWMA statistic and find 1. The three issues mentioned above are handled
the smallest sum of squares of the EWMA statistics in a single statistical model.
for the remaining variables. Therefore, we calculate 2. The full posterior distributions contain infor-
for each variable mation not possible in a likelihood analysis.
Si2 = wtT ⌃ 1
wt (ri wt )2 3. Prior information, if available, can be easily in-
corporated.
where ri is the ith row of R and RT R = ⌃ 1 . These
Si2 statistics are the total weighted sum of squares We have concentrated on using a Bayesian ap-
minus the square of transformed EWMA statistic of proach for a retrospective analysis in Phase II; that
the corresponding variable. A small Si2 statistic in- is, once a signal is raised on a traditional multivariate
dicates that the corresponding variable has shifted; control chart, we look back to determine which vari-
conversely, when Si2 is large, there is little evidence ables shifted, when they shifted, and what the new
that the corresponding variable has shifted. We rank shifted mean(s) is(are). Change point models can,
all process variables based on their Si2 ’s and remove however, be constructed to perform statistical pro-
those variables with the least evidence of a shift so as cess monitoring. To accomplish this in the Bayesian
to reduce the dimension p for future Bayesian change context, we would add a “zero-th” model, which is

that no variables shifted. In practice we would prob- of this problem in a bivariate process setting. Our
ably want to assign a high prior probability to this method could be applied repeatedly for the before-
model since process shifts are rather rare events. shift data and the after-shift data.
Then, after each sampling stage, we could deter-
mine the posterior probability of “no change,” and, if Acknowledgments
this is sufficiently large, we would continue the pro-
The authors would like to thank the Editor and
cess. See Tsiamyrtzis and Hawkins (2007) and Apley
two reviewers for helpful comments. Their careful
(2012).
reading has certainly led to a better presentation of
this material.
In this paper, we assume that there is only one
process change point. The Bayesian approach to
Appendix A
identifying multiple change points in a univariate
process has been studied in Erdman and Emerson In the univariate case, assuming µ1 , 1 = 2
(2007) and Giordani and Kohn (2008), and, recently, are known, the joint posterior distribution of (µ2 , k)
Cheon and Kim (2010) attacked a specific form given x = (x1 , x2 , . . . , xn ) can be expressed as
p(µ2 , k | x) / L(x; µ2 , k) p0 (µ2 ) p0 (k)

" !# r 
⇣ ⌧ ⌘n/2 ⌧ X
k n
X ⌧ ⌧ 1
2
= exp (xi µ1 ) +
2
(xi µ2 )2
exp (µ2 ↵) ⇥ .
2⇡ 2 i=1 2⇡ 2 n 1
i=k+1
After removing unnecessary constants, we obtain

" k
# " n
#
⌧X ⌧ X ⌧
p(µ2 , k | x) / exp (xi µ1 ) ⇥ exp
2
(xi µ2 )2
(µ2 ↵)2
2 i=1 2 2
i=k+1
" k n
!#
⌧ X X
= exp (xi µ1 ) + 2
xi2
2 i=1
i=k+1
" n
!#
⌧ X
⇥ exp 2µ2 xi + (n k + )µ22 2 ↵µ2 + 2 ↵
2
i=k+1
" k n
!#
⌧ X X
/ exp (xi µ1 )2 + x2i
2 i=1
i=k+1
h ⌧ i
⇥ exp (n k + )µ2 2µ2 ((n k)x̄0 + ↵)
2
2
" k n
!#
⌧ X X
= exp (xi µ1 ) + 2
xi2
2 i=1
i=k+1
 ✓ ◆
⌧ (n k + ) ↵ + (n k)x̄0
⇥ exp µ22 2 µ2 ,
2 n k+
Pn
where x̄0 = i=k+1 xi /(n k) is the average of ob-
servations after the mean shift.
Appendix B
The model structure of Example 1 is represented
by Figure 13. There are three models, and each is
depicted in a square. The circles in the middle row
represent parameters and their prior distributions.
The circle in the bottom represents process observa-
tion.

FIGURE 13. Graphical Representation of the WinBUGS Model.
References Carlin, B. P.; Gelfand, A. E.; and Smith, A. F. M. (1992).

“Hierarchical Bayesian Analysis of Changepoint Problems”.
Apley, D. W. (2012). “Posterior Distribution Charts: A Applied Statistician 41, pp. 389–405.
Bayesian Approach for Graphically Exploring a Process Champ, C. W.; Jones-Farmer, L. A.; and Rigdon, S. E.
Mean”. Technometrics, to appear. (2005). “Properties of the T 2 Control Chart When Parame-
Barnard, G. A. (1959). “Control Charts and Stochastic Pro- ters Are Estimated”. Technometrics 47, pp. 437–445.
cesses”. Journal of the Royal Statistical Society, Series B 21, Cheon, S. and Kim, J. (2010). “Multiple Change-Point De-
pp. 239–271. tection of Multivariate Mean Vectors with the Bayesian Ap-
Berger, J. O. and Pericchi, L. R. (1996). “The Intrinsic proach”. Computational Statistics and Data Analysis 54, pp.
Bayes Factor for Model Selection and Prediction,” Journal 406–415.
of the American Statistical Association 91, pp. 109–122. Chernoff, H. and Zacks, S. (1964). “Estimating the Cur-
Best, N. G.; Cowles, M. K.; and Vines, S. K. (1995). CODA rent Mean of a Normal Distribution Which Is Subjected to
Manual Version 0.30. Cambridge, UK: MRC Biostatistic Changes in Time”. Annual of Mathematical Statistics 35,
Unit. pp. 999–1018.
Carlin, B. P. and Chib, S. (1995). “Bayesian Model Choice Erdman, C. and Emerson, J. W. (2007). “bcp: An R Package
via Markov Chain Monte Carlo Methods”. Journal of Royal for Performing a Bayesian Analysis of Change Point Prob-
Statistical Society, Series B 57, pp. 473–484. lems”. Journal of Statistical Software 23, pp. 1–13.

Gelman, A. and Rubin, D. (1992). “Inference from Iterative Raftery, A. E. and Lewis, S. M. (1996). “Implementing
Simulation Using Multiple Sequences”. Statistical Science 7, MCMC”. In Markov Chain Monte Carlo in Practice, W.
pp. 457–511. R. Gilks, S. Richardson, and D. J. Spiegelhalter, eds., pp.
Geweke, J. (1992). “Evaluating the Accuracy of Sampling- 115-130. London: Chapman & Hall.
Based Approaches to the Calculation of Posterior Moments”. Smith, A. F. M. (1975). “A Bayesian Approach to Inference
In Bayesian Statistics 4, J. M. Bernardo, J. O. Berger, A. P. About a Change-Point in a Sequence of Random Variables”.
Dawid, and A. F. M. Smith, eds., pp. 169–193. City, State: Biometrika 62, pp. 407–416.
Oxford University Press. Son, Y. S. and Kim, S. W. (2005). “Bayesian Single Change
Giordani, P. and Kohn, R. (2008). “Efficient Bayesian In- Point Detection in a Sequence of Multivariate Normal Ob-
ference for Multiple Change-Point and Mixture Innovation servations”. Statistics 39, pp. 373–387.
Model”. Journal of Business and Economic Statistics 26,
Sullivan, J. H. and Woodall, W. H. (2000). “Change-Point
pp. 66–77.
Detection of Mean Vector or Covariance Matrix Shifts Using
Holmes, D. S. and Mergen, A. E. (1993). “Improving the
Multivariate Individual Observations”. IIE Transactions 32,
Performance of the T 2 Control Chart” Quality Engineering
pp. 537–549.
5, pp. 619–625.
Jiang, W.; Wang, K.; and Tsung, R. (2012). “A Variable- Tsiamyrtzis, P. and Hawkins, D. M. (2007). “A Bayesian Ap-
Selection-Based Multivariate EWMA Chart for Process proach to Statistical Process Control”. In Bayesian Process
Monitoring and Diagnosis”. Journal of Quality Technology, Monitoring, Control, and Optimization, B. M. Colosimo and
to appear. E. del Castillo, eds., pp. 87–107. Boca Raton, FL: Chapman
Lowry, C. A.; Woodall, W. H.; Champ, C. W.; and Rig- & Hall/CRC.
don, S. E. (1992). “A Multivariate Exponentially Weighted Wang, K. and Jiang, W. (2009). “High-Dimensional Process
Moving Average Control Chart”. Technometrics 34, pp. 46– Monitoring and Fault Isolation via Variable Selection”. Jour-
53. nal of Quality Technology 41, pp. 247–258.
Moreno, E.; Casella, G.; and Garcia-Ferrer, A. (2005). Zamba, K. D. and Hawkins, D. M. (2009). “A Multivariate
“An Objective Bayesian Analysis of the Change Point Prob- Change-Point Model for Change in Mean Vector and/or Co-
lem”. Stochastic Environmental Research and Risk Assess- variance Structure”. Journal of Quality Technology 41(3),
ment 9, pp. 191–204. pp. 285–303.

P ms1434 Rev

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

P ms1434 Rev

Uploaded by

Copyright:

Available Formats

A Bayesian Approach to Change Point

Estimation in Multivariate SPC

Problem Description that is not possible in a likelihood-based analysis.

weighted moving average (MEWMA) control chart

Control charting can be broken down into two

Vol. 44, No. 3, July 2012 231 www.asq.org

Journal of Quality Technology Vol. 44, No. 3, July 2012

Vol. 44, No. 3, July 2012 www.asq.org

Multivariate Process Change Model 3

of the mean change model, we would take ⇡1 = ⇡2 =

Journal of Quality Technology Vol. 44, No. 3, July 2012

among these three models, which one should be cho-

Bivariate Case: Before-Shift Parameters Posterior Distributions

Vol. 44, No. 3, July 2012 www.asq.org

Journal of Quality Technology Vol. 44, No. 3, July 2012

TABLE 1. Mean-Change Models and Their Codes for a 4-Dimensional Process

Model 1 [1 0 0 0] The first process mean changed.

Vol. 44, No. 3, July 2012 www.asq.org

TABLE 2. Number of Model Parameters for All Possible Pairwise Comparisons

4 means change vs. 3 means change 1 15 4 6 26

ample, we use 5 comparisons of two models to find Implementation in WinBUGS

FIGURE 3. Time Series Plot of the Standardized Bivariate Process.

Journal of Quality Technology Vol. 44, No. 3, July 2012

Vol. 44, No. 3, July 2012 www.asq.org

FIGURE 6. Model Selection and Change Point Identification from MCMC.

Journal of Quality Technology Vol. 44, No. 3, July 2012

Note: The correct model is Model 1 (mean shift on variable 1 only).

Method Shift size of 0.5 Shift size of 1 Shift size of 2

Bayesian 0.384 (0.016) 0.774 (0.014) 0.966 (0.006)

Method Shift size of 0.5 Shift size of 1 Shift size of 2

Bayesian 0.454 (0.017) 0.829 (0.013) 0.987 (0.004)

Vol. 44, No. 3, July 2012 www.asq.org

Likelihood of identifying the Likelihood of identifying the

Bayesian 0.845 (0.012) 0.905 (0.010)

Journal of Quality Technology Vol. 44, No. 3, July 2012

FIGURE 10. Search Path for the 4-Variate Process Ex-

means changed) and used the criterion (i.e., Bayes

Vol. 44, No. 3, July 2012 www.asq.org

sands are needed in order to estimate the param-

TABLE 7. Computation Requirement of Change Point Estimation on a Multivariate Process

4 variables 6 variables 8 variables 10 variables

Time of one pairwise 2.19 4.72 8.18 12.61

Journal of Quality Technology Vol. 44, No. 3, July 2012

Vol. 44, No. 3, July 2012 www.asq.org

p(µ2 , k | x) / L(x; µ2 , k) p0 (µ2 ) p0 (k)

After removing unnecessary constants, we obtain

Journal of Quality Technology Vol. 44, No. 3, July 2012

FIGURE 13. Graphical Representation of the WinBUGS Model.

References Carlin, B. P.; Gelfand, A. E.; and Smith, A. F. M. (1992).

Vol. 44, No. 3, July 2012 www.asq.org

Journal of Quality Technology Vol. 44, No. 3, July 2012

You might also like