Professional Documents
Culture Documents
P ms1434 Rev
P ms1434 Rev
STEVEN E. RIGDON
Southern Illinois University Edwardsville, Edwardsville, IL 62026
A Bayesian procedure is developed to estimate the time of a change in the process mean vector for
a multivariate process given that an out-of-control signal was raised on a multivariate control chart. In
addition, we can infer simultaneously which variable(s) had a change in mean value, when the change
occurred, and the value of the changed mean. All three problems (change point time, variables that
shifted, and new values for the shifted variables) are addressed in a single statistical model. Markov chain
Monte Carlo (MCMC) methods, through the software WinBUGS, are used to estimate parameters of the
change point models. To identify the mean shift in a process with more than two variables, we propose a
branch-and-bound search algorithm so that MCMC can be carried out with a predictable computing time in
each search step. A simulation study shows that the Bayesian approach has similar performance compared
to the maximum likelihood estimation (MLE) in terms of identifying the true change point location when a
noninformative prior is assumed; however, it can perform better when proper prior knowledge is incorporated
into the estimation procedure. The Bayesian approach provides full posterior distributions for the model
and change point, which can contain information that is not available in a likelihood analysis.
Key Words: Markov Chain Monte Carlo; MEWMA; Process Mean Shift Model; Search Algorithm.
Multivariate processes are of interest here, where problem. A Bayesian analysis can provide full poste-
we assume that process variables follow a multivari- rior distributions for these quantities, giving informa-
ate normal distribution. We assume that only a mean tion that is not available in a likelihood analysis. For
change is possible and that the covariance matrix the likelihood-based approach to change point esti-
of process variables is fixed (this constraint can be mation, we refer the reader to Sullivan and Woodall
easily relaxed as we will see in an example later). (2000), Zamba and Hawkins (2009), and their related
The mean change is a shift in its level. For example, references.
when there are two process variables that are simul-
taneously monitored, we have three possible types of Bayesian Method
change: for a Univariate Process
1. the first process variable’s mean has shifted; Study of the Bayesian approach to the change
2. the second process variable’s mean has shifted; point problem can be traced back to Barnard (1959)
and Cherno↵ and Zacks (1964), who mentioned that
3. both variables’ means have shifted.
their study was motivated by a “tracking” problem,
Therefore, besides identifying when the shift hap- where occasional changes in the direction of a path
pened, we also need to identify which change model were of interest. A decade later, Smith (1975) gave a
occurred. Specifically, the three models are more comprehensive treatment of Bayesian inference
about a single change point in a univariate process.
Both the binomial distribution and normal distribu-
Model 1
tion were discussed in Smith (1975). Here we will
✓ ◆ briefly summarize the result for the normal distribu-
µ11
X1 , X2 , . . . , Xk ⇠ N ,⌃ tion.
µ21
and Consider a time series from independent normal
✓ ◆ distributions, X1 , X2 , . . . , Xk ⇠ N (µ1 , 12 ) and Xk+1 ,
µ12
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃ Xk+2 , . . . , Xn ⇠ N (µ2 , 22 ). The main task is then to
µ21
find the posterior distribution of k, the time of pro-
cess mean change. After this change point, the pro-
Model 2 cess mean will shift to a new, unknown level, and we
✓ ◆ would like to estimate this value. We assume that
µ11 there is a single process change, which occurs after
X1 , X2 , . . . , Xk ⇠ N ,⌃
µ21 time k, although our methods could be applied re-
and peatedly to the before- and after-shift data. For ex-
✓ ◆ ample, if a signal is raised at time 40 and the point
µ11
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃ estimate of k is 20, we could apply the methods sep-
µ22
arately to data values 1 through 20, and 21 through
40. Such an approach could detect multiple change
Model 3
points as well as di↵erent sets of variables changing
✓ ◆ at di↵erent times.
µ11
X1 , X2 , . . . , Xk ⇠ N ,⌃
µ21 There are numerous models that could be con-
and sidered based on what is assumed known and what
✓ ◆ is unknown. For example, we could assume that µ1
µ12 and 12 are known, whereas µ2 , 22 , and k are un-
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,⌃
µ22 known. We could also assume that 22 is known, such
that 22 = 12 , leaving µ2 and k as the only unknown
In general, when there are p process variables, quantities. This is the assumption we make in the re-
there exist 2p 1 mean shift models. This num- mainder of this section, but other combinations are
ber becomes very large even for a modest value of certainly possible.
p, so eliciting all possible shift models is trouble-
some. Therefore, model selection presents a major To facilitate the analysis, we use a precision pa-
challenge in the multivariate change point problem. rameter ⌧ , which is the reciprocal of variance (⌧ =
In this paper we discuss a Bayesian approach to this 1/ 2 ). Then, assuming that the before-shift and
"
after-shift variances are equal (i.e., 12 = 22 = 2 , ⌧ (n k+ )
or equivalently ⌧1 = ⌧2 = ⌧ ), the likelihood functions ⇥ exp
2
for the data before and after the change are
✓ ◆#
L(x1 , . . . , xk ; µ1 , ⌧, k) ↵ + (n k)x̄0
⇥ µ22 2 µ2 ,
Yk r h ⌧ i n k+
⌧
= exp (xi µ1 )2 , Pn
i=1
2⇡ 2 where x̄0 = i=k+1 xi /(n k) is the average of ob-
servations after the mean shift.
and
Integrating out µ2 , we find that the marginal pos-
L(xk+1 , . . . , xn ; µ2 , ⌧, k)
r terior density of the change point k is
Y n
⌧ h ⌧ i
= exp (xi µ2 )2 . p(k | x1 , x2 , . . . , xn )
2⇡ 2
i=k+1
1
/p
The total likelihood function is therefore n k+
"
L(x1 , . . . , xn ; µ1 , µ2 , ⌧, k) ⌧ X
k
⇣ ⌧ ⌘n/2 ⇥ exp (xi µ1 )2
= 2 i=1
2⇡ !#
" k n
!# n
X ( ↵ + (n k)x̄0 )2
⌧ X X + x2i .
⇥ exp (xi µ1 )2 + (xi µ2 )2 . n k+
2 i=1 i=k+1
i=k+1
Inference for k is then based on this posterior distri-
For µ2 (which together with k are the only unknowns
bution. A point estimate of the change point is the
under our assumptions), we use a conjugate prior, the
posterior mode, that is, the value of k that maximizes
normal distribution N (↵, ( ⌧ ) 1 ), which has proba-
p(k | x1 , x2 , . . . , xn ).
bility density function (PDF)
r If we assume a noninformative prior for µ2 by set-
⌧ ⌧
p0 (µ2 ) = exp (µ2 ↵) .
2
ting its precision hyperparameter equal to zero (im-
2⇡ 2 plying an infinite prior variance), then the posterior
Here ↵ and are considered to be known and reflect of the change point is
the kind of shift that can be expected. For example,
if a process is operating with a mean of 0.5 and stan- p(k | x1 , x2 , . . . , xn )
dard deviation 0.02, a shift to 0.6 might be as large 1
/p
as can be expected. In this case we would want 0.6 n k
" !#
to be near the upper tail of the prior distribution; we ⌧ X
k n
X
could then choose ↵ and accordingly. ⇥ exp (xi µ1 ) +
2
(xi x̄ )
0 2
.
2 i=1
i=k+1
Given that a change has occurred, a noninforma-
tive prior for k is the uniform distribution This is proportional to the likelihood function, so
the maximum likelihood estimate and the posterior
1
p0 (k) = , k = 1, 2, . . . , n 1. mode are the same in this case.
n 1
Let the prior distributions of k and µ2 be indepen- This fully Bayesian parametric approach to the
dent. change point problem was expanded by using hier-
archical Bayes (HB) model in Carlin et al. (1992).
Based on the Bayes theorem, the posterior distri- The HB model provided more flexibility in defining
bution of (µ2 , k) is (the derivation is given in Ap- the prior distributions of model parameters such as
pendix A) the before- and after-shift process means. The poste-
p(µ2 , k | x1 , x2 , . . . , xn ) rior densities of the parameters of interest in Carlin
et al. (1992) were obtained through Markov chain
/ L(x1 , . . . , xn ; µ1 , µ2 , ⌧, k)p0 (µ2 )p0 (k)
" !# Monte Carlo (MCMC) simulations. These authors
k n
⌧ X X also showed ways of applying their method on sta-
/ exp (xi µ1 ) +2
xi2
2 i=1 tionary, but autocorrelated, processes and on regres-
i=k+1
sion models.
⇡2 = P (Model 2) for Model 1, while for Model 2, the variables xi1 and
xi2 are interchanged in the above formula. For Model
=P (only the second mean changed)
3, if we assume that the after-shift mean has the
⇡3 = P (Model 3) bivariate normal prior
=P (both means changed) ✓ 1 ◆
↵1 0
N , 1 ,
where ⇡1 +⇡2 +⇡3 = 1. Assuming no prior knowledge ↵2 0 2
1
⇥ xi2 + 2 ↵22 A5 .
Model 2 i=k+1
✓ ◆
0 Although the posterior distribution of the change
Xk+1 , Xk+2 , . . . , Xn ⇠ N ,I
µ2 point can be derived in this bivariate process case, we
with prior probability ⇡2 ; have not solved the problem of model selection, i.e.,
chain iteration only. The full conditional distribution tial values, then the mixing of simulated chains and
of ✓j is given by the comparison of variance of individual chain to the
total variance of the mixture of chains are used to
p(✓j | ✓i6=j , M, x)
⇢ make a decision on convergence. Both methods are
L(x; ✓j , M = j)p0 (✓j | M = j) when M = j available in the CODA library (Best et al. (1995))
/
p0 (✓j | M 6= j) when M 6= j. in R.
This formula says that observations will not a↵ect
the inference of ✓j if model j is not selected. The Model Selection When There Are
posterior of M is given by More Than Two Process Variables
p(M = j | ✓, x) For a bivariate process, when we are to identify
m
Y its mean change there are three change models as de-
L(x; ✓j , M = j) p0 (✓i | M = j)⇡j scribed in the previous section. In general, the pro-
= i=1
. cess’s mean vector and variance-covariance matrix
m
X m
Y are unknown. To apply MCMC to a bivariate pro-
L(x; ✓j , M = k) p0 (✓i | M = k)⇡k cess, there are 17 model parameters that need to be
k=1 i=1
sampled at each sampling iteration. These parame-
Note that using this pseudoprior method, our main ters include one model indicator, three change point
purpose is to select the most probable model. The variables (one for each of the three mean shift mod-
posterior p(✓j | ✓i6=j , M, x) is not of interest; instead, els), three variance-covariance components, and 10
the conditional posterior p(✓j | ✓i6=j , M = j, x) is the process means (three means, including before- and
correct one for parameter estimation. after-shift, for Model 1 and for Model 2, and four
means for Model 3). For a trivariate process, there
For example, in the bivariate case where the pa-
is one model indicator (with 23 1 = 7 possible val-
rameters from before and after the shift are unknown,
ues), 33 means, 7 change point variables (one for each
the super-parameter set is then
h possible model), 3 variances, and 3 covariances. For
✓ = m; k1 , k2 , k3 ; µ111 , µ112 , µ121 ; µ211 , µ212 , a p-variate process, the number of means is
i0 ✓ ◆ ✓ ◆ ✓ ◆
µ222 ; µ311 , µ312 , µ321 , µ322 , 11
2 2
, 22 , 12 . p p p
(p + 1) + (p + 2) + · · · + (p + p)
1 2 p
When Model 1 is chosen, for example, the sub-
= 3p2p 1
p.
parameter set is
⇥ ⇤0
✓1 = k1 , µ111 , µ112 , µ121 , 11
2 2
, 22 , 12 . Thus, for a p-variate process, there is one model indi-
At each stage of the MCMC the model updates all cator (with 2p 1 possible values), 3p2p 1 p mean
parameters in the super set, even those parameters parameters,
✓ ◆ 2
p 1
change point variables, p variances,
p
that are not part of the current model. and covariances. Since this “curse of dimen-
2
Theoretically the MCMC samples will converge sionality” problem prevents the direct application of
to the targeted joint distribution asymptotically. In MCMC on the multivariate change point problem
practice, we need to check the convergence of the with more than 2 or 3 process variables, we pro-
MCMC chain so as to determine an appropriate sam- pose a branch-and-bound algorithm to handle the
ple size. The accuracy of the MCMC algorithm in change point problem on a high-dimensional multi-
estimating posterior moments has been discussed in variate process. In essence, we will conduct a series
Geweke (1992). We use two popular diagnostic meth- of model comparisons, while in each comparison only
ods adopted in literature: R-L, proposed by Raftery two models are evaluated. Due to the hierarchical
and Lewis (1996), and G-R, proposed by Gelman and structure of multivariate mean-shift models, we can
Rubin (1992). The R-L method is based on monitor- start from the model where all means have shifted at
ing the autocorrelation within one single chain. It a change point and compare it with a model where
provides the theoretical minimum number of itera- all means except one have shifted. The all-means-
tions of an MCMC chain and suggests the number of change model is at the top of the hierarchy, and
burn-in iterations and the number of total iterations. these all-means-but-one-change models are the direct
The G-R method requires performing multiple sim- sub-models of the all-means-change model. Using the
ulations, starting from di↵erent (over-dispersed) ini- Bayes factor as a criterion, if the all-means-change
model is more plausible than an all-means-but-one- models. This process will continue until we find a
change model, then we will eliminate the consider- model all of whose sub-models are inferior.
ation of the latter model and continue the compar-
We code the multivariate change point model as a
ison of the all-means-change model to another all-
series of 0-1 codes, where 1 indicates a mean change
means-but-one-change model. When there is one all-
and 0 is no change. For example, a 4-dimensional
means-but-one-change model that is better than the
multivariate process has the change point models as
all-means-change model, the no-change mean will be
shown in Table 1.
fixed, and we will continue to try out other means.
We compare this all-means-but-one-change model to If the true change model is Model 6, [1 0 1 0],
an all-means-but-two-change model with the previ- then the search path starting from the top of the
ous no-change mean being fixed, i.e., its direct sub- model hierarchy is illustrated in Figure 2. In this ex-
FIGURE 2. Search Path to Model 6. The dash links will not be searched in finding the right model.
Model
Comparison indicator Means Variances Covariances Total
FIGURE 4. Scatter Plot of the Bivariate Process. FIGURE 5. The Shrink Factors of After-Shift Process
Means of Model 3.
mean vector and covariance matrix have shifted af-
ter the change. We do not know the true values of the burn-in period and summarize the posterior dis-
the mean vector or covariance matrix either before or tribution of the parameters of interest based on the
after the change. Let the prior distribution of mean second half (5000) samples. The result shows that the
vector (either before- or after-shift) be bivariate nor- third model, both means shifted, is the most plausi-
mal distribution with mean [0 0]0 . The prior distri- ble model and, the most likely change point is at the
butions of the covariance matrices are defined sepa- 24th observation, although change points around 40
rately. We use a gamma prior for each component in are also possible (see Figure 6). It could also sug-
the precision matrix (inverse of the covariance ma- gest that there are two change points. This bimodal
trix), and the after-shift precision has its prior being feature of the posterior distribution can convey in-
the before-shift prior multiplied by a sizing factor. formation that is not available through a likelihood
This factor has a uniform prior distribution centered analysis. This is because the likelihood function gives
at 1. In WinBUGS the relationships of model pa- the probability distribution of getting the observed
rameters, hyperparameters, and observations can be data given fixed values of the parameters, whereas
represented by a probabilistic graphical model, which the posterior gives the probability of each possible
is given in Appendix B. After 10,000 MCMC sam- change point given the observed data, which has a
ples, we find that there is enough model mixing from much more natural interpretation. See Apley (2012)
the trace plot of model indicator M . To check con- for a further discussion of multi-modal posterior dis-
vergence of other parameters, we start two chains tributions.
with di↵erent initial values and apply the R-L and
G-R methods. The diagnosis of these chains indicates Using the posterior medians, we estimate that the
that 5000 burn-in samples are adequate. The depen- before-shift mean vector and the after-shift mean
dence factors for the R-L method are always lower vector are
than 3 (less than 5 is considered acceptable). The G-
0.3083 0.3852
R method calculates the shrink factor (the ratio of µ3,1 = and µ3,2 = ,
0.1983 0.1913
within and between chain sample variances) for vary-
ing sample sizes. As one can see from Figure 5, after respectively. The before-shift covariance matrix is es-
5000 burn-ins the median shrink factor is close to 1, timated as
which indicates the MCMC chain has converged to a
stationary process. ˆ 3,1 = 0.87705 0.6623
⌃ ,
0.6623 0.8925
We treat the first half of the MCMC samples as
and the after-shift covariance matrix as We evaluate the performance of Bayesian and
MLE approaches on processes with shift sizes of
ˆ 0.9914 0.7729 0.5, 1, and 2 standard deviations. Whenever the
⌃3,2 = .
0.7729 0.9997 MEWMA chart signals a process change, both
Bayesian and likelihood procedures for change point
analysis are applied. The likelihood method can only
Example 2: A Simulation Study estimate the change point for a given change model
To evaluate the performance of the Bayesian ap- (assuming we know the true change model), but the
proach and to compare it with other methods, such Bayesian method is able to identify simultaneously
as maximum likelihood estimation (MLE), we im- the change model, change point, and mean shift size;
plemented a simulation study with 1000 randomly therefore, we will compare the performance of these
generated bivariate processes. The mean of each vari- two methods in terms of the change point location
able is 0, and the variance is 1. The two variables are estimation only. As the control chart can signal a
correlated, and the covariance is 0.5. The simulated process change before the true mean shift happens,
process has a mean shift on its first component after we discard such processes and analyze only the pro-
the 20th observation. The simulated bivariate pro- cesses having more than 20 observations.
cess was monitored by the MEWMA chart with the Table 3 provides the performance measures of the
average in-control run length being 200 (Lowry et al. Bayesian method on choosing the correct change
(1992)). The plot of one simulated process and the model for di↵erent shift sizes. One can see that, as
MEWMA chart are shown in Figures 7 and 8. expected, the probability of correctly identifying the
FIGURE 7. Time Series Plot for One Simulated Bivariate FIGURE 8. MEWMA Chart for Simulated Bivariate Pro-
Process. cess.
TABLE 3. Out of 1000 Simulations, the Proportion of Time the Correct Model Was Selected
and Its Standard Error in Parenthesis
Most likely model Shift size of 0.5 Shift size of 1 Shift size of 2
Model 1 (mean shift on variable 1) 0.870 (0.011) 0.897 (0.010) 0.953 (0.007)
Model 2 (mean shift on variable 2) 0.105 (0.010) 0.090 (0.009) 0.035 (0.006)
Model 3 (mean shift on both variables) 0.025 (0.005) 0.013 (0.004) 0.012 (0.003)
right model (Model 1) increases when the shift size To compare the performance of change point iden-
increases. However, even when the shift size is rel- tification of Bayesian and MLE methods, we present
atively small (0.5 standard deviation) the chance of the proportion of times that the true change point
identifying the right model is still high (>85%). Our (20) is in one of the five top locations identified by
previous concern of the negative e↵ect of the nested either method and the proportion of times that the
structure of change models does not occur in the most likely change point location is within 5 time
simulation study, as the chance of choosing Model units of the true change point location (Tables 4
3 is much smaller than any other models, which in- and 5, respectively). In the Bayesian approach we
dicates that the Bayesian method does not exhibit a use a noninformative uniform prior for the change
tendency toward the complex model when the pro- point location parameter. For example, if there are
cess’s dimension is low. Meanwhile, we examined sev- 30 observations, then each one, from the first ob-
eral simulated processes where change models were servation to the next to last observation, will have a
wrongly chosen and find that they are so noisy that 1/29 prior probability as a change point (i.e., the last
the selected change model appears to be more reason- point before the change in mean level). As one can
able than the true change model. In other words, the see from these tables, the Bayesian and MLE meth-
control chart signals a process change not because of ods are similar in their ability to identify the true
the mean shift on the first variable, but because some change point. Both methods are good at finding the
large random error happened on the other variable. true change point when the shift size is 1 standard
TABLE 4. Out of 1000 Simulations When the Change Model Was Correctly Identified, the Proportion of Times
That the True Change Point is One of the Five Most Likely Values, as Measured by the Posterior
Distribution or the Likelihood Function, and Its Standard Error in Parenthesis
TABLE 5. Out of 1000 Simulations, When the Change Model Was Correctly Identified, the Proportion of Times
That the Most Likely Change Point Is Within Five Time Units of the True Change Point Value,
and Its Standard Error in Parenthesis
TABLE 6. Out of 1000 Simulations, When the Change Model Was Correctly Identified, the Proportion of Times
of Correctly Identifying True Change Point by the Bayesian and MLE Methods,
When Some Proper Prior Knowledge of Change Point Was Applied
deviation or larger, but both fail when the shift size means have shifted, and values of the new means)
is only 0.5 standard deviation, simply because the are handled within a single model and framework. If
process is too noisy. the specification limits are known, then the Bayesian
method can also provide an estimate (along with its
One of the advantages of the Bayesian approach
uncertainty) of the fraction nonconforming after the
is that it is easy to incorporate prior process knowl-
shift.
edge into change point estimation. For example, for
the simulated process we assume that the first 15 ob- Example 3: High-Dimensional Multivariate
servations are unlikely to be change points; then we Processes
may let their prior density be 0 and assign other ob-
servations to a uniform distribution. This assumption The following test problem is to demonstrate
is reasonable because the simulated process runs at the pairwise model comparison and the branch-and-
least 20 observations and the control chart signals as bound search algorithm. The process has four vari-
soon as it detects a mean shift. Table 6 provides the ables, and there are 60 observations in total, assum-
results from the Bayesian approach under such a sce- ing a change point happens after the 30th observa-
nario when the shift size is 1 standard deviation, and tion and the first and third process mean shifted by
the results are compared with the MLE approach. It 0.5 . In this example, we use independent process
is clear that the prior knowledge helps in correctly variables. The process observations are depicted in
identifying the true change point, and the Bayesian Figure 9.
approach outperforms the MLE approach. The ma-
jor advantage, however, of the Bayesian approach is We implemented the pairwise comparison method,
that all three problems (time of change point, which starting from the topmost hierarchy (i.e., all four
FIGURE 9. A Four-Variate Process Where Variables 1 and 3 Shifted Upward by 0.5 at Time 30.
TABLE 8. Simulations Result of Pre-Diagnosis of point analysis. Table 8 provides the result of a simu-
Shifted Variables Using S 2i Statistic lation study of this pre-diagnosis method, where 1000
processes are simulated and the first two variables
Number of Number of Percentage of Removing are shifted to 1 after 20 observations. An MEWMA
Process Variables Any Shifted Variable control chart with = 0.1 is applied such that the
Variables Removed (standard error) asymptotic in-control average run length (ARL) is
200. When the chart signals an alarm, the method
above is applied on the EWMA statistics of all vari-
15 1 0.3 % (0.17 %)
ables to remove the least likely shifted variables.
15 2 1.0 % (0.31 %)
From the table, we can see that the percentage of the
15 3 2.2 % (0.46 %)
removed variables that have not shifted is small. In
15 4 2.4 % (0.48 %)
summary, this pre-diagnosis method can be applied
15 5 3.5 % (0.58 %)
after a control chart signals an alarm to remove those
variables for which the evidence of a shift is least; we
20 2 0.7 % (0.26 %) could then apply the Bayesian change point analysis
20 4 1.0 % (0.31 %) on the reduced set of process variables.
20 6 2.3 % (0.47 %)
20 8 3.4 % (0.57 %) Summary and Conclusions
20 10 5.7 % (0.73 %)
We have shown how a Bayesian approach can be
used to select among possible change point models
for multivariate SPC. These models are very flexible
The method of Jiang et al. (2012) involves solving and can incorporate various assumptions, including
the constrained optimization problem whether the parameters before or after the shift are
known, and whether the before and after variances
min (wt µt )T ⌃ 1
(wt µt ) are constant. This Bayesian analysis can be accom-
µt
plished by addressing three related questions:
subject to
X 1. Which variables have shifted?
I(|µi(j) | 6= 0) s,
j 2. When did the shift occur?
where wt is the exponentially weighted moving av- 3. What are the values of the parameters after the
erage (EMWA) vector at the time of control chart shift?
alarm; the constraint is to limit the number of po- The software WinBUGS, used in conjunction with
tential variables with shifts to be equal to s. The R, can be used to perform the required Monte Carlo
objective is basically to minimize the weighted mean simulations. The Bayesian analysis has three main
square error of the EWMA statistics. The solution to assets:
this optimization problem is to set the means of those
shifted variables to their EWMA statistic and find 1. The three issues mentioned above are handled
the smallest sum of squares of the EWMA statistics in a single statistical model.
for the remaining variables. Therefore, we calculate 2. The full posterior distributions contain infor-
for each variable mation not possible in a likelihood analysis.
Si2 = wtT ⌃ 1
wt (ri wt )2 3. Prior information, if available, can be easily in-
corporated.
where ri is the ith row of R and RT R = ⌃ 1 . These
Si2 statistics are the total weighted sum of squares We have concentrated on using a Bayesian ap-
minus the square of transformed EWMA statistic of proach for a retrospective analysis in Phase II; that
the corresponding variable. A small Si2 statistic in- is, once a signal is raised on a traditional multivariate
dicates that the corresponding variable has shifted; control chart, we look back to determine which vari-
conversely, when Si2 is large, there is little evidence ables shifted, when they shifted, and what the new
that the corresponding variable has shifted. We rank shifted mean(s) is(are). Change point models can,
all process variables based on their Si2 ’s and remove however, be constructed to perform statistical pro-
those variables with the least evidence of a shift so as cess monitoring. To accomplish this in the Bayesian
to reduce the dimension p for future Bayesian change context, we would add a “zero-th” model, which is
that no variables shifted. In practice we would prob- of this problem in a bivariate process setting. Our
ably want to assign a high prior probability to this method could be applied repeatedly for the before-
model since process shifts are rather rare events. shift data and the after-shift data.
Then, after each sampling stage, we could deter-
mine the posterior probability of “no change,” and, if Acknowledgments
this is sufficiently large, we would continue the pro-
The authors would like to thank the Editor and
cess. See Tsiamyrtzis and Hawkins (2007) and Apley
two reviewers for helpful comments. Their careful
(2012).
reading has certainly led to a better presentation of
this material.
In this paper, we assume that there is only one
process change point. The Bayesian approach to
Appendix A
identifying multiple change points in a univariate
process has been studied in Erdman and Emerson In the univariate case, assuming µ1 , 1 = 2
(2007) and Giordani and Kohn (2008), and, recently, are known, the joint posterior distribution of (µ2 , k)
Cheon and Kim (2010) attacked a specific form given x = (x1 , x2 , . . . , xn ) can be expressed as
Pn
where x̄0 = i=k+1 xi /(n k) is the average of ob-
servations after the mean shift.
Appendix B
The model structure of Example 1 is represented
by Figure 13. There are three models, and each is
depicted in a square. The circles in the middle row
represent parameters and their prior distributions.
The circle in the bottom represents process observa-
tion.
Gelman, A. and Rubin, D. (1992). “Inference from Iterative Raftery, A. E. and Lewis, S. M. (1996). “Implementing
Simulation Using Multiple Sequences”. Statistical Science 7, MCMC”. In Markov Chain Monte Carlo in Practice, W.
pp. 457–511. R. Gilks, S. Richardson, and D. J. Spiegelhalter, eds., pp.
Geweke, J. (1992). “Evaluating the Accuracy of Sampling- 115-130. London: Chapman & Hall.
Based Approaches to the Calculation of Posterior Moments”. Smith, A. F. M. (1975). “A Bayesian Approach to Inference
In Bayesian Statistics 4, J. M. Bernardo, J. O. Berger, A. P. About a Change-Point in a Sequence of Random Variables”.
Dawid, and A. F. M. Smith, eds., pp. 169–193. City, State: Biometrika 62, pp. 407–416.
Oxford University Press. Son, Y. S. and Kim, S. W. (2005). “Bayesian Single Change
Giordani, P. and Kohn, R. (2008). “Efficient Bayesian In- Point Detection in a Sequence of Multivariate Normal Ob-
ference for Multiple Change-Point and Mixture Innovation servations”. Statistics 39, pp. 373–387.
Model”. Journal of Business and Economic Statistics 26,
Sullivan, J. H. and Woodall, W. H. (2000). “Change-Point
pp. 66–77.
Detection of Mean Vector or Covariance Matrix Shifts Using
Holmes, D. S. and Mergen, A. E. (1993). “Improving the
Multivariate Individual Observations”. IIE Transactions 32,
Performance of the T 2 Control Chart” Quality Engineering
pp. 537–549.
5, pp. 619–625.
Jiang, W.; Wang, K.; and Tsung, R. (2012). “A Variable- Tsiamyrtzis, P. and Hawkins, D. M. (2007). “A Bayesian Ap-
Selection-Based Multivariate EWMA Chart for Process proach to Statistical Process Control”. In Bayesian Process
Monitoring and Diagnosis”. Journal of Quality Technology, Monitoring, Control, and Optimization, B. M. Colosimo and
to appear. E. del Castillo, eds., pp. 87–107. Boca Raton, FL: Chapman
Lowry, C. A.; Woodall, W. H.; Champ, C. W.; and Rig- & Hall/CRC.
don, S. E. (1992). “A Multivariate Exponentially Weighted Wang, K. and Jiang, W. (2009). “High-Dimensional Process
Moving Average Control Chart”. Technometrics 34, pp. 46– Monitoring and Fault Isolation via Variable Selection”. Jour-
53. nal of Quality Technology 41, pp. 247–258.
Moreno, E.; Casella, G.; and Garcia-Ferrer, A. (2005). Zamba, K. D. and Hawkins, D. M. (2009). “A Multivariate
“An Objective Bayesian Analysis of the Change Point Prob- Change-Point Model for Change in Mean Vector and/or Co-
lem”. Stochastic Environmental Research and Risk Assess- variance Structure”. Journal of Quality Technology 41(3),
ment 9, pp. 191–204. pp. 285–303.