Professional Documents
Culture Documents
Phase I Outlier Detection in Profiles With Binary Data Based On Penalized Likelihood - QREI - 2019 PDF
Phase I Outlier Detection in Profiles With Binary Data Based On Penalized Likelihood - QREI - 2019 PDF
DOI: 10.1002/qre.2376
RESEARCH ARTICLE
1
Tianjin University, College of
Abstract
Management and Economics, Tianjin,
Tianjin, China Profile monitoring has been proven to be important among statistical process
2
Industrial Engineering, College of control problems. In some specific applications, the response variable of a pro-
Management and Economics, Tianjin file can be categorical data, and numerous methods have been proposed for
University, Tianjin, Tianjin, China
monitoring this type of profiles. Nevertheless, outlier detection for profiles with
Correspondence categorical data still attracted insufficient attention in the literature. To this
Yanfen Shang, Industrial Engineering,
end, this paper focuses on binary profiles and develops two schemes for outlier
College of Management and Economics,
Tianjin University, Nankai District, detection, from the viewpoint of penalized likelihood, based on the group
Tianjin, Tianjin 300072, China. LASSO method and directional information, respectively, in which the profiles
Email: syf8110@gmail.com
of interest are treated as an integrated high‐dimensional vector. Simulation
Funding information study shows that the proposed group‐type scheme usually performs better than
National Natural Science Foundation of
China, Grant/Award Numbers: 71672122,
the existing T 2I chart in terms of detecting outliers correctly and alleviating the
71532008, 71401123, 71402118 and masking effect. Finally, a real example is used for illustrating the implementa-
71472132 tion of the proposed scheme.
KEYWORDS
binary profile, group LASSO, masking effect, outlier detection, penalized likelihood
Qual Reliab Engng Int. 2019;35:1–13. wileyonlinelibrary.com/journal/qre © 2018 John Wiley & Sons, Ltd. 1
2 LI ET AL.
As for nonlinear profiles, Williams et al12 introduced among the reference dataset and remove them so that a
four T2 methods based on nonlinear regression. Consider- reliable functional curve can be established for use in
ing both within‐profile and between‐profile variations, Phase II.20 However, to the best of our knowledge, there
Paynabar and Jin13 developed a wavelet‐based mixed‐ are limited previous works specializing in outlier detec-
effect model for complex nonlinear profiles. Zhang and tion for profile processes with categorical data. Although
Albin3 proposed a χ2 control chart by treating profiles as the T 2I control chart proposed by Yeh et al16 can be used
high‐dimensional vectors and argued that the χ2 chart is to detect outliers, its performance is only studied in terms
of great help for detecting outliers for nonlinear profiles of probability of signal, which is incapable of determining
especially when the profiles are very complex. Zou et al4 the masking effect (see Zou et al4 for details). Therefore,
indicated that the χ2 control chart does outperform its in this study, we aim to develop new schemes for detect-
counterparts in the case of complex profiles, but it suffers ing outliers efficiently and analyze their detecting perfor-
from a certain masking effect. For this reason, they pro- mance more thoroughly. The differences between our
posed a new procedure for outlier detection, based on work and the existing research are the following: first of
penalized regression, by treating profiles as vectors all, our study focuses on diagnosing the outliers in Phase
following Zhang and Albin.3 In certain cases, a single I profiles with binary data. Therefore, it is different from
profile may require quite a long time to generate, which those existing schemes which are about change‐point
is a challenging problem for conventional SPC methods. detection, Phase II profile monitoring, and/or profiles
Motivated by an ingot growth process in semiconductor with numerical data. Moreover, our proposed schemes
manufacturing, Dai et al14 proposed a method for moni- can be extended to other profiles with categorical data,
toring the dynamically growing profile trajectory to detect and the extension is presented in the following section.
unexpected changes during the long processing cycle. In a Secondly, the proposed schemes are quite different from
recent paper, Paynabar et al15 focused on Phase I analysis the control‐chart‐based methods, such as the T 2I chart in
of multichannel nonlinear profiles and proposed a new Yeh et al.16 To use this kind of scheme, the control limit
framework, in which the monitoring statistics is con- is necessary. So, if it cannot be calculated, and then it is
structed by incorporating the multi‐dimensional func- usually obtained by simulation by utilizing more data
tional principal component analysis into change‐point and/or time. However, our schemes do not need the con-
models. trol limit and then can save data and time. Moreover, the
All the aforementioned works rely on a basic assump- superiority in detecting performance is investigated in the
tion that the response variable of the profile is continu- following by comparing with the T 2I chart.
ous. However, it is very likely that only categorical data Note that the outliers in a given dataset generally pos-
are available in practical applications, as a result of which sess the sparsity property, which means only a small por-
this assumption can be violated. To this end, Yeh et al16 tion of profiles are outliers. Based on this property, there
investigated a profile problem whose response variable have been several methods for detecting outliers, such as
is binary. In their study, the logistic regression model cluster‐based methods, classification methods, machine
was used to express the relationship between the learning methods, statistical model‐based methods, and
response variable and the explanatory variables and five so on. The detailed review can be found in Hodge and
T2 control charts were proposed and compared in terms Austin21 and Jobe and Pokojovy.22 Considering that vari-
of the signal probability in various scenarios. Integrating able selection can be used for selecting sparsity model, so
the EWMA scheme and the LRT, Shang et al17 studied we detect outliers based on it in this paper. In fact, vari-
Phase II monitoring and proposed a novel control chart able selection methods have been widely used in the
for binary profiles with random covariates. Paynabar SPC literature. Zou and Qiu23 adapted the LASSO to the
et al18 developed a Phase I chart for surgical‐operation multivariate SPC problem and proposed a Phase II con-
improvement, in which the surgical outcomes are binary. trol chart by integrating the LASSO‐based test statistic
The authors constructed the control chart via a likeli- with the EWMA procedure to determine the shift direc-
hood‐ratio test derived from a change‐point model tion based on observed data. Combining Bayesian Infor-
(LRTCP) based on the risk‐adjustment logistic regression. mation Criterion (BIC) with the adaptive LASSO, Shang
Recently, Shadman et al19 proposed a unified framework et al24 developed a diagnosis scheme for multistage pro-
combining a change point model with a generalized cesses with binary outputs. Therefore, inspired by Zhang
linear model, which can be used to develop Phase I and Albin,3 in this article, we transform the historical
control charts for profiles with continuous, count, or profiles into a high‐dimensional vector, and then develop
categorical data. an outlier detection scheme based on the group LASSO
In a typical Phase I profile problem, one of the method which can be found in Yuan and Lin25 and Meier
most significant steps is to identify the outlying profiles et al.26 Furthermore, the priori directional information
LI ET AL. 3
and Qiu22 and the references therein). Therefore, we uti- value of BIC can be obtained by the following equation,
lize the following penalized loss function to reduce the
1
complexity of calculation, BICλ ¼ −ℓ b
δ λ þ dλ lnðnÞ; (5)
2
PLðδ Þ ¼ −ℓðδ Þ þ Pλ ;
where dλ denotes the degree of the model,
where Pλ is the penalty function with a tuning parameter
b
λ. b δ λi
To select variable at the factor level, we apply the dλ ¼ ∑I δ λi >0 þ ∑ ðoÞ ðp−1Þ;
i i
b
δi
group‐type penalty that can encourage sparsity at the fac-
tor level, motivated by Yuan and Lin25 and Meier et al.26
ð oÞ ðoÞ
Then, the penalized loss function becomes where b δ λi and b
δ i are the ith subvectors of b
δ λ and b
δ ,
respectively.
m
PLðδ Þ ¼ −ℓðδ Þ þ λ ∑ kδ i kK i ; Then, set a series of values for λ appropriately and
i¼1 utilize BIC to determine the optimal value defined as
1=2 follows,
where kδ i kK i ¼ δ Ti K i δ i and the kernel matrix
Ki = pIp. λð*Þ ¼ arg min BICλ :
λ
Given a specified value of λ, we need the minimizer
Finally, the estimation bδ λð*Þ with sparsity property can be
b
δ λ ¼ arg minPLðδ Þ:
δ a measurement of
*
n the outlierso in the dataset and the set of
b b
outliers is Ω ¼ i∶δ λð*Þ i ≠0 .
To obtain b
δ λ , we get the following score function, The above steps are further integrated into the follow-
ing procedure that is called Group‐type Penalized Outlier
∂PLðδ Þ
¼ −X *T ðy−μÞ þ Λδ¼0; (3) Detection (GPOD).
∂δ
where X* = diag {X, ⋯, X} is a block diagonal matrix Step 1. Given an off‐line dataset, obtain the maximized
with m identical blocks and X = (x1, ⋯, xn)T; likelihood estimator for each profile, b βi (i = 1, ⋯,
T m). Let bβ0 be the median of b βi .
y¼ yT1 ; ⋯; yTm and yi = ( yi1, ⋯, yin)T;
T T Step 2. Define the high‐dimensional shift vector δ and
μ¼ μ1 ; ⋯; μTm and μi = (Ni1πi1, ⋯, Ninπin)T; Λ is an ðoÞ
obtain its original estimation b δ based on the proce-
mp × mp diagonal matrix that equals dure in appendix.
Step 3. Choose an appropriate range and set T levels for
pffiffiffi 1 1
Λ¼λ p · diag I p ; ⋯; Ip : the tuning parameter λ. For each λ(t) (t = 1, ⋯, T),
kδ 1 k kδ m k
obtain the b δ λðtÞ by using the following IWLS
Based on Equation (3), we have the IWLS equation as procedure:
ðoÞ ð0 Þ ð oÞ
follows, δ be the starting point b
(a) let b δ ¼b δ ;
(b) at the lth iteration (l ≥ 0), calculate the q(l),
ðlÞ
c X * þ Λ δ ¼ X *T W
X *T W c q: (4) c ðlÞ , Λ(l) based on b
W δ , and then update the
n o estimation of δ via the following equation,
c ¼ diag W
where W cm , W
c 1 ; ⋯; W c i ¼ diagfN i1 π
b i1 ð1−b
π i1 Þ
b in ð1−b
; ⋯; N in π π in Þg and q denotes the adjusted depen- ðlþ1Þ
ðlÞ
−1 ðlÞ
b
δ c X * þ ΛðlÞ
¼ X *T W c qðlÞ :
X *T W
dent variate,
LI ET AL. 5
(c) repeat step (b) until the following conver- with Equation (6) leading to the following penalized loss
gence criterion is met, function,
ðlÞ ðl−1Þ ðlÞ m
b b b
δ −δ = δ ≤ ε; PL δe ¼ −ℓ δe þ ∑ Pλi
δe
:
1 1 0 0 0i
i¼1
ðlÞ
where bδ is the estimation of δ at the lth iteration. ε is a The penalty function applied here is supposed to encour-
small positive constant (eg, ε = 10−4). ‖v‖1 denotes the age sparsity at the component level. Moreover, adaptive
sum of absolute values of all components of v. weights are utilized for penalizing different coefficients in
Step 4. Use BIC to determine λ(*). the ℓ1 penalty (see the adaptive lasso in Zou32). The penal-
n final estimation
Step 5. Based on the o b
δ λð*Þ , obtain the set ized loss function becomes the following form,
*
b
b ¼ i∶δ ð*Þ ≠0 .
of outliers Ω
λ i m
PL δe ¼ −ℓ δe þ λ ∑ w b i
δe
;
0 0 0i
i¼1
δe0
0i
in any other βik. To be distinguishable from the GPOD, PL δe ¼ −ℓ δe þ λ ∑
ðoÞ
: (7)
0 0 i¼1
δe0 i
exp δe x j0 þ xTj b
β0 to 0, then set δe as 0 exactly; otherwise, the penalty func-
0i
π ij ¼ 0 i : tion can be locally approximated by the following qua-
1 þ exp δe x j0 þ xTj b
β0 dratic approximation,
0i
Consequently, we have the following joint log‐likelihood
1 P′λ i
δe
0 ð0
Þi
function, Pλi
δe
≈Pλi
δe
þ
δ 2
−δ
e0 i e0 ð0Þi :
2
0i 0 ð0Þi 2
0 1
δe0 ð0Þi
B C
B ! C Therefore, Equation (7) can be approximated as follows,
m n B C
B N ij C
ℓ δe ¼ ∑ ∑ B ln þ yij δe x j0 þ x Tj b
β0 −N ij ln C: 1
i¼1 j¼1 B yij C
0 0i
B C PL δe ¼ −ℓ δe þ δeT Θδe þ C; (8)
@ A 0 0 2 0 0
1 þ exp δe x j0 þ x Tj b β0
0i where C is a constant corresponding to δe and Θ is an
0 ð0Þ
(6) m × m diagonal matrix defined as follows,
8 9
Define the maximizer of ℓ δe as the original estimator > >
0 < 1 1 =
ð oÞ Θ ¼ λ· diag
ðoÞ
; ⋯;
ðoÞ
:
b
δe0 , which can be obtained via the procedure in :
b
>
δe0 1
·
δe
b
>
δe0 m
·
δe0 ð0Þm
;
Appendix. Similarly, the penalized technique is combined 0 ð0Þ1
6 LI ET AL.
Differentiate Equation (8) with respect to δe and the score (9) and (10), the log‐likelihood function for the ith profile
0
function is as follows, could be obtained, and then similarly, the penalized
loss function and the corresponding detecting schemes
∂PL δe can be written and derived according to the steps in 3.1
¼ −I *T ðy−μÞ þ Θδe ¼ 0;
0
and 3.2.
∂δe 0
0 For example, if yij follows Poisson distribution, the
link function becomes
where I * ¼ diag e 1; ⋯; e
1 is a block diagonal matrix
with m identical blocks and e 1 ¼ ð1; ⋯; 1ÞT is an n‐dimen-
ln θij ¼ xTj βi ; i ¼ 1; ⋯; m; j ¼ 1; ⋯n:
sional vector; y and μ have the same definitions as that in
Equation (3).
and b(θij) = ln θij, γ(θij) = θij, d( yij) = − ln ( yij!). There-
Based on the score function, the following IWLS
fore, the log‐likelihood function for the ith profile is the
equation can be used to obtain b δe for a given λ, following
0λ
c I * þ Θ δ ¼ I *T W
I *T W c q: n
e0 li ¼ ∑ yij ln θij −θij − ln yij !
j¼1
where W c and q have the same definitions as that in
Equation (4). Recall the associated notation in the previous subsec-
After getting b
δe , the BIC value can be calculated tions, the log‐likelihood function for m profiles is
0λ
written as
based on Equation (5). It is worth noting that the
definition of dλ here has been different from that in
m n
Equation (5), although the equation form of BIC is the ℓðδ Þ ¼ ∑ ∑ yij xTj δ i þ b
β0 − exp xTj δ i þ b
β0 − ln yij ! :
same. According to Zou et al,34 dλ denotes the number i¼1 j¼1
TABLE 1 Performance comparison among the T 2I chart, GPOD and DPOD with various shifts in βi0 for m = 20
Cf Uf Of Rf
δΩT mo TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD
TABLE 2 Performance comparison among the T 2I chart, GPOD, and DPOD with various shifts in βi1 for m = 20
Cf Uf Of Rf
δΩT mo TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD
(0, 0.1) 1 0.63 0.74 0.37 ‐ ‐ ‐ 0.04 0.09 0.37 0.33 0.17 0.26
2 0.33 0.56 0.24 0.46 0.26 0.08 0.03 0.07 0.50 0.18 0.11 0.18
3 0.13 0.39 0.16 0.67 0.42 0.09 0.02 0.05 0.53 0.18 0.14 0.22
4 0.03 0.26 0.09 0.75 0.51 0.08 0.01 0.03 0.54 0.21 0.20 0.29
5 0 0.14 0.06 0.70 0.55 0.07 0 0.01 0.46 0.30 0.30 0.41
6 0 0.07 0.04 0.62 0.53 0.05 0 0.01 0.37 0.38 0.39 0.54
(0, 0.2) 1 0.91 0.88 0.71 ‐ ‐ ‐ 0.09 0.12 0.29 0 0 0
2 0.82 0.85 0.57 0 0 0 0.18 0.15 0.43 0 0 0
3 0.63 0.78 0.44 0 0 0 0.37 0.22 0.56 0 0 0
4 0.35 0.70 0.32 0 0 0 0.65 0.30 0.68 0 0 0
5 0.11 0.58 0.20 0 0 0 0.89 0.42 0.80 0 0 0
6 0.02 0.43 0.11 0 0 0 0.95 0.56 0.89 0.03 0.01 0
(0, 0.3) 1 0.87 0.89 0.77 ‐ ‐ ‐ 0.13 0.11 0.23 0 0 0
2 0.62 0.85 0.69 0 0 0 0.38 0.15 0.31 0 0 0
3 0.20 0.80 0.60 0 0 0 0.80 0.20 0.40 0 0 0
4 0 0.70 0.47 0 0 0 1 0.30 0.53 0 0 0
5 0 0.57 0.34 0 0 0 1 0.43 0.66 0 0 0
6 0 0.42 0.22 0 0 0 1 0.58 0.78 0 0 0
TABLE 3 Performance comparison among the T 2I chart, GPOD, and DPOD with various shifts in both βi0 and βi1 for m = 20
Cf Uf Of Rf
δΩT mo TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD TI2 GPOD DPOD
(0.1, 0.1) 1 0.93 0.88 0.59 ‐ ‐ ‐ 0.06 0.12 0.39 0.01 0 0.02
2 0.85 0.81 0.41 0.04 0.01 0 0.10 0.17 0.58 0.01 0.01 0.01
3 0.71 0.71 0.25 0.10 0.03 0 0.17 0.23 0.74 0.02 0.03 0.01
4 0.47 0.58 0.14 0.20 0.04 0 0.24 0.30 0.84 0.09 0.08 0.02
5 0.21 0.42 0.08 0.28 0.06 0 0.28 0.33 0.88 0.23 0.19 0.04
6 0.05 0.30 0.04 0.25 0.07 0 0.18 0.31 0.88 0.52 0.32 0.08
(0.1, 0.2) 1 0.91 0.87 0.73 ‐ ‐ ‐ 0.09 0.13 0.27 0 0 0
2 0.75 0.80 0.60 0 0 0 0.25 0.20 0.40 0 0 0
3 0.43 0.69 0.42 0 0 0 0.57 0.31 0.58 0 0 0
4 0.12 0.53 0.25 0 0 0 0.88 0.47 0.75 0 0 0
5 0.01 0.37 0.15 0 0 0 0.99 0.63 0.85 0 0 0
6 0 0.22 0.06 0 0 0 1 0.78 0.94 0 0 0
(0.2, 0.2) 1 0.88 0.87 0.75 ‐ ‐ ‐ 0.12 0.13 0.25 0 0 0
2 0.64 0.76 0.60 0 0 0 0.36 0.24 0.40 0 0 0
3 0.23 0.60 0.41 0 0 0 0.77 0.40 0.59 0 0 0
4 0.01 0.41 0.22 0 0 0 0.99 0.59 0.78 0 0 0
5 0 0.22 0.09 0 0 0 1 0.78 0.91 0 0 0
6 0 0.10 0.03 0 0 0 1 0.90 0.97 0 0 0
proposed penalized‐type schemes can alleviate masking very serious in reality. Moreover, the GPOD is the most
effects to a certain extent. This finding is similar to that recommendable method in various situations due to its
in Zou et al.4 consistently remarkable performance in detecting true
As described in Zou et al,4 swamping can lead to nor- outlying profiles.
mal profiles being considered as outliers, and in outlier As shown in Table 2, the similar conclusions are
detection, masking is usually worse than swamping obtained. The GPOD outperforms the T 2I chart and DPOD
because it can result in gross distortions. Therefore, the as there are multiple outlying profiles, and the superiority
DPOD could be an effective method if masking effect is of the proposed DPOD to T 2I is more obvious when the
LI ET AL. 9
shift is small. It is noteworthy that the T 2I chart does not larger. Moreover, it can be found out that the T 2I remains
perform better as the magnitude of shift increases. We incapable of identifying outliers correctly in some cases
think the reason may be that the large shift can influence when the ratio of the outliers in the dataset is slightly
the accuracy of the computation of T 2I values. large.
Table 3 shows the simulation results when shifts are Besides, in order to further study the detection perfor-
imposed on the model parameters βi0 and βi1 simulta- mance of the proposed methods, we consider another
neously, the finding of which is same as above. The logistic regression model, of which the in‐control param-
DPOD performs best among these three schemes in terms eter vector is β0 = (1, −1, 0.5)T (ie, p = 3) and the design
of avoiding masking effect. Besides, the GPOD outper- matrix is as follows,
forms the other two methods in detecting true outliers
across almost all situations. 0 1T
We also investigate the detecting performance of the 1 1 ⋯ 1
B C
methods for m = 10 and m = 30. Likewise, the largest B C
X ¼ B ln1 ln2 ⋯ ln20 C :
ratio of outliers is set to be mo/m = 0.3. For m = 10, mo @ A
=1, 2, and 3, and for m = 30, mo takes nine values rang- 0:1 0:2 ⋯ 2
ing from 1 to 9. For the sake of brevity, only the probabil-
ity of diagnosing all the true outliers correctly is shown in
Table 4 and Figure 1, but the original data is available The simulated results are summarized in Table 5. Only
from the authors. the “Cf” for m = 10 is presented here, because the find-
According to the performance comparison in Table 4 ings for p = 3 are quite similar with that for p = 2. As
and Figure 1, the GPOD is further proven to be superior shown in Table 5, the GPOD performs best in terms of
to the T 2I and the DPOD in most cases, and the DPOD “Cf” in most cases, and the DPOD also performs better
outperforms the T 2I as the number of outlying profiles is than the T 2I when the ratio of outliers is large.
Cf Cf Cf
mo δΩT TI2 GPOD DPOD δΩT TI2 GPOD DPOD δΩT TI2 GPOD DPOD
1 (0.2, 0) 0.61 0.60 0.45 (0, 0.1) 0.67 0.71 0.46 (0.1, 0.1) 0.88 0.88 0.65
2 0.21 0.32 0.28 0.26 0.45 0.28 0.64 0.70 0.38
3 0.02 0.13 0.16 0.03 0.23 0.16 0.22 0.47 0.17
1 (0.3, 0) 0.89 0.89 0.69 (0, 0.2) 0.83 0.91 0.76 (0.1, 0.2) 0.77 0.87 0.76
2 0.65 0.77 0.46 0.42 0.80 0.56 0.16 0.67 0.50
3 0.22 0.55 0.24 0.04 0.59 0.34 0 0.40 0.23
1 (0.4, 0) 0.86 0.90 0.75 (0, 0.3) 0.66 0.90 0.81 (0.2, 0.2) 0.67 0.85 0.76
2 0.49 0.79 0.54 0.02 0.78 0.65 0.03 0.57 0.43
3 0.07 0.60 0.30 0 0.57 0.45 0 0.25 0.15
FIGURE 1 Performance comparison among the T 2I , GPOD, and DPOD in three typical scenarios for m = 30
10 LI ET AL.
TABLE 5 “Cf” comparison among the T 2I chart, GPOD, and DPOD for m = 10 with various shifts in β when p = 3
Cf Cf
mo δΩT TI2 GPOD DPOD δΩT TI2 GPOD DPOD
To sum up, from the above tables and figure, we con- at time point MIS( j). In this way, the data in Table 6
clude that the GPOD scheme can be a favorable detecting can be viewed as a dataset consisting of 10 binary profiles,
method in identifying outlying profiles with binary data. where the response variable is yij and the explanatory
To alleviate masking effect, the DPOD scheme can also variable is MIS( j) denoted by xj here. The design matrix
be the alternative effective method. is as follows,
!T
1 1 ⋯ 1
5 | A RE A L EX A M P L E X¼ :
A PP L I C A T IO N 1 2 ⋯ 12
to the early failures of the automobiles for each MOP, Step 2. Define the high‐dimensional shift vector δ and
ð oÞ
yij, is recorded against each MIS (month in service). That obtain its original estimator b
δ based on equations
is, after MIS( j), yij out of Ni automobiles failed. So yij in appendix.
follows the binomial distribution, yij~BIN(Ni, πij) where Step 3. Set an appropriate range and grid it into T
πij is the failure rate of the automotive sample MOP(i) levels for the penalty tuning parameter λ. For each
LI ET AL. 11
MIS(j)
MOP(i) Ni 1 2 3 4 5 6 7 8 9 10 11 12
MIS( j)
MOP(i) Ni 1 2 3 4 5 6 7 8 9 10 11 12
that have resulted in significant improvements in the arti- 19. Shadman A, Mahlooji H, Yeh AB, Zou C. A change point
cle. This research was supported by the National Natural method for monitoring generalized linear profiles in phase I.
Science Foundation of China (Nos. 71672122, 71532008, Qual Reliab Eng Int. 2015;31(8):1367‐1381.
71472132, 71402118 and 71401123). 20. Qiu P, Zou C, Wang Z. Nonparametric profile monitoring by
mixed effects modeling. Dent Tech. 2010;52:265‐277.
21. Hodge V, Austin J. A survey of outlier detection methodologies.
R EF E RE N C E S Artif Intell Rev. 2004;22(2):85‐126.
1. Mahmoud MA, Woodall WH. Phase I analysis of linear profiles 22. Jobe JM, Pokojovy M. A cluster‐based outlier detection scheme
with calibration applications. Dent Tech. 2004;46:380‐391. for multivariate data. J Am Stat Assoc. 2015;110(512):1543‐1551.
2. Mahmoud MA, Parker PA, Woodall WH, Hawkins DM. A 23. Zou C, Qiu P. Multivariate statistical process control using
change point method for linear profile data. Qual Reliab Eng LASSO. J Am Stat Assoc. 2009;104(488):1586‐1596.
Int. 2007;23(2):247‐268. 24. Shang Y, Zi X, Tsung F, He Z. LASSO‐based diagnosis scheme
3. Zhang H, Albin S. Detecting outliers in complex profiles using a for multistage processes with binary data. Comput Ind Eng.
χ2 control chart method. IIE Trans. 2009;41:335‐345. 2014;72:198‐205.
4. Zou C, Tseng ST, Wang Z. Outlier detection in general profiles 25. Yuan M, Lin Y. Model selection and estimation in regression
using penalized regression method. IIE Trans. 2014; with grouped variables. J R Stat Soc Series B Stat Methodology.
46(2):106‐117. 2006;68(1):49‐67.
5. Kang L, Albin S. On‐line monitoring when the process yields a 26. Meier L, Van De Geer S, Bühlmann P. The group lasso for logis-
linear. J Qual Technol. 2000;32(4):418‐426. tic regression. J R Stat Soc Series B Stat Methodology. 2008;70(1):
53‐71.
6. Kim K, Mahmoud MA, Woodall WH. On the monitoring of lin-
ear profiles. J Qual Technol. 2003;35(3):317‐328. 27. Shang Y, Man J, He Z, Ren H. Change‐point detection in phase I
for profiles with binary data and random predictors. Qual Reliab
7. Zou C, Zhang Y, Wang Z. A control chart based on a change‐
Eng Int. 2016;32(7):2549‐2558.
point model for monitoring linear profiles. IIE Trans.
2006;38(12):1093‐1103. 28. McCullagh P, Nelder JA. Generalized Linear Models. 37 CRC
Press; 1989.
8. Noorossana R, Eyvazian M, Amiri A, Mahmoud MA. Statistical
monitoring of multivariate multiple linear regression profiles in 29. Hawkins DM. Identification of Outliers. 11 London: Chapman
phase I with calibration application. Qual Reliab Eng Int. and Hall; 1980.
2010;26(3):291‐303. 30. Schwarz G. Estimating the dimension of a model. Ann Stat.
9. Mestek O, Pavlík J, Suchánek M. Multivariate control charts: 1978;6(2):461‐464.
control charts for calibration curves. Fresenius J Anal Chem. 31. Tibshirani R. Regression shrinkage and selection via the lasso. J
1994;350(6):344‐351. R Stat Soc B Methodol. 1996;58:267‐288.
10. Yeh A, Zerehsaz Y. Phase I control of simple linear profiles with 32. Zou H. The adaptive lasso and its oracle properties. J Am Stat
individual observations. Qual Reliab Eng Int. 2013;29(6):829‐840. Assoc. 2006;101(476):1418‐1429.
11. Mahmoud MA. Phase I analysis of multiple linear regression 33. Fan J, Li R. Variable selection via nonconcave penalized likeli-
profiles. Commun Stat Simul Comput. 2008;37(10):2106‐2130. hood and its oracle properties. J Am Stat Assoc. 2001;
96(456):1348‐1360.
12. Williams JD, Woodall WH, Birch JB. Statistical monitoring of
nonlinear product and process quality profiles. Qual Reliab 34. Zou H, Hastie T, Tibshirani R. On the “degrees of freedom” of
Eng Int. 2007;23(8):925‐941. the lasso. Ann Stat. 2007;35(5):2173‐2192.
13. Paynabar K, Jin J. Characterization of non‐linear profiles varia- 35. Zou C, Ning X, Tsung F. LASSO‐based multivariate linear pro-
tions using mixed‐effect models and wavelets. IIE Trans. file monitoring. Ann Oper Res. 2012;192(1):3‐19.
2011;43(4):275‐290.
14. Dai C, Wang K, Jin R. Monitoring profile trajectories with
dynamic time warping alignment. Qual Reliab Eng Int. 2014;
30(6):815‐827. Zhen Li is a graduate student of the College of Man-
15. Paynabar K, Zou C, Qiu P. A change‐point approach for phase‐I agement and Economics at Tianjin University, China.
analysis in multivariate profile monitoring and diagnosis. Dent
Tech. 2016;58:191‐204. Yanfen Shang is an associate professor of the College
16. Yeh AB, Huwang L, Li YM. Profile monitoring for a binary of Management and Economics at Tianjin University,
response. IIE Trans. 2009;41(11):931‐941. China. She received her BS and MS degrees from
17. Shang Y, Tsung F, Zou C. Profile monitoring with binary data Tianjin University, Tianjin, People's Republic of
and random predictors. J Qual Technol. 2011;43(3):196‐208. China, and PhD degree from Hong Kong University
18. Paynabar K, Jin J, Yeh AB. Phase I risk‐adjusted control charts of Science and Technology (HKUST), Hong Kong.
for monitoring surgical performance by considering categorical Her research interests include quality management
covariates. J Qual Technol. 2012;44(1):39‐53. and statistical process control.
LI ET AL. 13
n o
Zhen He is a professor in the College of Management where c ¼ diag W
W c 1 ; ⋯; W
cm ,
and Economics, Tianjin University. He received his c i ¼ diagfN i1 π
W b i1 ð1−b b in ð1−b
π i1 Þ; ⋯; N in π π in Þg and b ij
π
PhD in Management Science and Engineering from needs to be updated based on the current b δ for each step
Tianjin University, China, in 2001. He is the recipient of iteration; q denotes the adjusted dependent variate,
of Outstanding Research Young Scholar Award of the
National Natural Science Foundation of China. His * −1
q ¼ logitðπ Þ−X * b c ðy−μÞ;
β0 þ W
research interests focus on statistical quality control,
DOE, and Six Sigma management. T
where q¼ qT1 ; ⋯; qTm , qi = (qi1, ⋯, qin)T,
T
π¼ π T1 ; ⋯; π Tm , πi = (πi1, ⋯, πin)T, i = 1, ⋯, m;
*
T
T T
How to cite this article: Li Z, Shang Y, He Z. β0 ¼ b
b β0 ; ⋯; b
β0 is an m × p‐dimensional vector with
Phase I outlier detection in profiles with binary ð oÞ
each subvector being b β . Accordingly, the b
0 δ can be
data based on penalized likelihood. Qual Reliab
obtained by solving Equation A. (A 2).
Engng Int. 2019;35:1–13. https://doi.org/10.1002/
qre.2376 Similarly, the b
δ ðoÞ also needs to be obtained by solving
e0
IWLS equation. Take the derivative of Equation (6) with
respect to δe and obtain the following score function,
0
A P P EN D I X : O B T A I N T HE O RI GI N A L
E ST IM A TO R S O F TH E SC HE M E S ℓ δe T
¼ I * ðy−μÞ;
0
ð oÞ ∂δe
The original estimator b
δ can be obtained by solving the 0