Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Process monitoring using variational autoencoder for high-dimensional


nonlinear processes✩
Seulki Lee a , Mingu Kwak a , Kwok-Leung Tsui b , Seoung Bum Kim a ,∗
a
School of Industrial Management Engineering, Korea University, 145 Anam-Ro, Seoungbuk-Gu, Seoul 02841, Republic of Korea
b
Department of Systems Engineering and Engineering Management, City University of Hong Kong, Hong Kong 999077, Hong Kong

ARTICLE INFO ABSTRACT


Keywords: In many industries, statistical process monitoring techniques play a key role in improving processes through
Statistical process monitoring variation reduction and defect prevention. Modern large-scale industrial processes require appropriate mon-
Variational autoencoder itoring techniques that can efficiently address high-dimensional nonlinear processes. Such processes have
High-dimensional process
been successfully monitored with several latent variable-based methods. However, because these monitoring
Nonlinear process
methods use Hotelling’s 𝑇 2 statistics in the reduced space, a normality assumption underlies the construction
Multivariate control chart
of these tools. This assumption has limited the use of latent variable-based monitoring charts in both nonlinear
and nonnormal situations. In this study, we propose a variational autoencoder (VAE) as a monitoring method
that can address both nonlinear and nonnormal situations in high-dimensional processes. VAE is appropriate
for 𝑇 2 charts because it causes the reduced space to follow a multivariate normal distribution. The effectiveness
and applicability of the proposed VAE-based chart were demonstrated through experiments on simulated data
and real data from a thin-film-transistor liquid-crystal display process.

1. Introduction between variables. The following control limit of the 𝑇 2 chart is derived
under an assumption of normality (Montgomery, 2009):
Many industrial processes are multivariate in nature because the 𝑝(𝑛 + 1)(𝑛 − 1)
quality of a product depends on many quality characteristics (Botía 𝐶𝐿𝑇 2 = 𝐹(𝛼,𝑝,𝑛−𝑝) , (2)
𝑛(𝑛 − 𝑝)
et al., 2013; Lu, 1998). Statistical process monitoring is a popular
where n and p are the number of observations and the number of
method used to maintain the stability of a process and prevent vari-
quality variables, respectively; 𝛼 is the Type I error rate (i.e., the false
ations in manufacturing and service operations (He and Wang, 2017;
alarm rate); and 𝐹(𝛼,𝑝,𝑛−𝑝) is the upper 𝛼th quantile of the F -distribution
Woodall et al., 2000). Multivariate control charts are the most popular
with p and (n−p) degrees of freedom.
process monitoring method to use when overall quality depends on
Indeed, multivariate process monitoring based on Hotelling’s 𝑇 2
simultaneously taking many quality characteristics into consideration
is ineffective with a large of number of correlated quality variables.
(Pham, 2001). Several multivariate control charts have been proposed,
With these correlated data, the covariance matrix S is difficult to invert
such as Hotelling’s 𝑇 2 charts, multivariate exponential weighted mov-
because the covariance matrix becomes almost a singular matrix, which
ing average charts, and multivariate cumulative sum charts (Lowry
leads to problematic results (Mason and Young, 2002; Seborg et al.,
and Montgomery, 1995). Of these, the most widely used method is
2010). Moreover, a large number of variables may degrade the ability
Hotelling’s 𝑇 2 control chart (Hotelling, 1947). The monitoring statistics
to detect a progress shift and lead to a multicollinearity problem (Ku
for the 𝑇 2 chart are computed from the following equation:
et al., 1995).
( )𝑇 ( )
𝑇 2 = 𝐱 − 𝐱 𝑺 −1 𝐱 − 𝐱 , (1) For monitoring high-dimensional process data, various latent
variable-based control charts have been proposed. The latent variable
where 𝐱 and 𝑺 are, respectively, the sample mean vector and sample models can extract the important information from the process by
covariance matrix determined from the in-control data. The 𝑇 2 statistic projecting the high-dimensional process data into a lower-dimensional
can be considered as the distance of an observation from the mid- space (Ge, 2017). A principal component analysis (PCA)-based con-
point of the in-control observations while considering the correlation trol chart was developed to address monitoring problems with large

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work.
For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.04.013.
∗ Corresponding author.
E-mail addresses: sklee2@korea.ac.kr (S. Lee), min9kwak@korea.ac.k (M. Kwak), kltsui@cityu.edu.hk (K.-L. Tsui), sbkim1@korea.ac.kr (S.B. Kim).

https://doi.org/10.1016/j.engappai.2019.04.013
Received 21 August 2018; Received in revised form 10 April 2019; Accepted 26 April 2019
Available online 15 May 2019
0952-1976/© 2019 Elsevier Ltd. All rights reserved.
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

numbers of correlated variables (Aguado and Rosen, 2008; Jolliffe, for high-dimensional industrial process data (Ge, 2018). However, PCA-
1986; Mason and Young, 2002). PCA involves an orthogonal linear based charts tend to perform poorly with some complicated industrial
transformation of the data to an ordered set of variables sorted by their processes involving nonlinear characteristics because PCA has a limited
variance (Shlens, 2014). Principal components (PCs) are uncorrelated, ability to deal with nonlinear processes (Dong and McAvoy, 1996; Ge
and most of the information in the original data can be explained by et al., 2013).
the first few PCs. Integration of the PCA with control chart techniques To address nonlinear characteristics, researchers have presented
has been widely used for process monitoring because it can handle several nonlinear extensions of the PCA method (Choi et al., 2005;
correlated high-dimensional data (Wise and Gallagher, 1996). PCA- Zhao and Xu, 2005). Dong and McAvoy (1996) proposed a nonlinear
based control charts are known to enhance the capability for early fault PCA by combining a principal curve and a neural network. Alternative
detection as well as the changes in the covariance structure (Kourti, nonlinear PCA methods have been proposed to solve nonlinear moni-
2005). Jackson (1959) proposed PCA-based 𝑇 2 charts that use the first toring problems (Cheng and Chiu, 2005; Maulud et al., 2006). Kernel
k PCs. That is, the monitoring statistics of Hotelling’s 𝑇 2 are calculated PCA (KPCA) is frequently used for nonlinear process monitoring. The
in the reduced feature space generated by the PCA. When the PCA is basic idea behind KPCA is to first map the input space into a high-
applied to observation matrix 𝑿, 𝑿 can be decomposed into the score dimensional space via nonlinear kernel mapping and then to compute
matrix 𝑼 = [𝑢1 , 𝑢2 , … , 𝑢𝑝 ] and the loading matrix 𝑽 = [𝑣1 , 𝑣2 , … , 𝑣𝑝 ] the PCs in that feature space (Li and Yang, 2014). KPCA-based control
based on the first k PCs: charts effectively capture the nonlinear relationships in the process

𝑘
variables and have superior monitoring performance compared with
𝑿 = 𝑼 𝒌 𝑽 𝑇𝒌 + 𝑬 = 𝑢𝑖 𝑣𝑇𝑖 + 𝑬, (3)
that of standard PCA (Ge et al., 2009; Lee et al., 2004).
𝑖=1
Recently, deep learning can be applied to various industrial process
where 𝑬 is the residual matrix in the PCA model measuring the noise
applications such as process control, optimization, and estimation of
or variations associated with (p−k) PCs. Hence, the first k PCs can
quality variables (Sun and Ge, 2018; Yao and Ge, 2018). Among
be used to calculate the following PCA-based 𝑇 2 statistic (Kourti and
them, deep learning methods for process monitoring have been ac-
MacGregor, 1995):
tively studied. Many deep learning neural network models, such as an

𝑘
𝑢2𝑖 autoencoder (AE), denoising AE, sparse AE, and restricted Boltzmann
𝑇𝑃2𝐶𝐴 = , (4) machines, are used for feature extraction techniques to address the
𝑖=1 𝑠2𝑢
𝑖
nonlinear relationships between process characteristics (Daszykowski
where 𝑠2𝑢 is the variance of the scores 𝑢𝑖 . Note that the 𝑇𝑃2𝐶𝐴 statistic et al., 2003; Hu et al., 2017; Lv et al., 2016; Shaheryar et al., 2016;
𝑖
is equivalent to the traditional 𝑇 2 statistics if k in Eq. (4) is replaced Zhang et al., 2013). AE is an efficient and flexible dimensionality
with p. Under the assumption that the data follow a multivariate normal reduction algorithm for large-scale data and has a specific structure of
distribution, the control limit of a PCA-based 𝑇 2 chart can be computed neural networks in which the input is the same as the output. It reduces
as follows: the input variables into lower-dimensional hidden nodes and then
𝑘(𝑛 + 1)(𝑛 − 1) reconstructs the output from the reduced data. AE-based control charts
𝐶𝐿𝑇 2 = 𝐹(𝛼,𝑘,𝑛−𝑘) , (5)
𝑃 𝐶𝐴 𝑛(𝑛 − 𝑘) provide more compact and informative monitoring results. However,
where n and k are, respectively, the number of observations and the these methods do not explicitly consider the fact that the monitored
number of PCs retained by the PCA, and 𝛼 is the false alarm rate. data should follow a multivariate normal distribution in the reduced
Because 𝑇𝑃2𝐶𝐴 measures the variation of each observation within the space when using 𝑇 2 charts. The decision boundary of the 𝑇 2 chart
first k PCs spaces, it only detects variations in the subspaces of the first k forms an ellipsoidal shape. Therefore, when in-control observations
PCs. That is, PCA-based 𝑇 2 charts may be unable to detect faults caused have both nonlinear and nonnormal patterns, the monitoring statistics
by (p−k) PCs (Mastrangelo et al., 1996). Hence, squared prediction of a 𝑇 2 chart cannot encompass the nature of the data, leading to a
error (SPE) charts have been proposed to detect the changes in events weakness that degrades detection performance. Furthermore, control
not explained by the k PCs (Jackson and Mudholkar, 1979). SPE charts limits using the F -distribution provide reliable and accurate results
can be constructed by using the residuals obtained from the remaining when the data follow a multivariate normal distribution.
set of (p−k) PCs. By calculating the quadratic orthogonal distance to In the present study, we focused on developing a multivariate
the k PC spaces, they measure the amount of variations that are not control chart based on the variational autoencoder (VAE) that can
captured by the retained PCs (Phaladiganon et al., 2013). PCA-based simultaneously handle nonlinear and nonnormal processes. VAE has
SPE statistics monitor the squared error between the true vector 𝑥 and recently been developed as a deep generative model, which is a pow-
the vector 𝑥̂ estimated by the PCA: erful method for learning representation from data in a nonlinear way.
( )2 It exploits information in the data density to find an efficient lower
∑ 𝑘
𝑆𝑃 𝐸𝑃 𝐶𝐴 = 𝑥 − 𝑢𝑖 𝑣𝑇𝑖 ̂ 2.
= (𝑥 − 𝑥) (6) dimensional feature space as a multivariate normal distribution (An and
𝑖=1 Cho, 2015; Suh et al., 2016). Therefore, the use of VAE for process
Assuming that the prediction errors follow a normal distribution, we monitoring can be powerful for both nonlinear characteristics and
can compute the control limits of SPE charts by the following approx- nonnormal distribution situations.
imate value based on a weighted chi-square distribution (Box, 1954): The main contributions of this study can be summarized as follows:
(1) We propose the use of VAE to monitor high-dimensional non-
𝑣 2 linear processes. VAE is not only a popular deep generative model,
𝐶𝐿𝑆𝑃 𝐸𝑃 𝐶𝐴 = 𝜒 , (7)
2𝑚 (2𝑚2 ∕𝑣,𝛼) but also an effective dimensionality reduction method in many fields.
where 𝑚 and 𝑣 are, respectively, the sample mean and sample variance Moreover, VAE is a powerful nonlinear feature extraction method that
of the 𝑆𝑃 𝐸𝑃 𝐶𝐴 , and 𝛼 is the false alarm rate. This approximating preserves the local structure of the training data.
distribution is found to work well even in cases where the prediction (2) Because existing latent variable-based monitoring methods use
errors do not follow a normal distribution (Van Sprang et al., 2002). 𝑇 2 statistics in the reduced space, they rely on an assumption that in-
By applying 𝑇 2 and SPE statistics, we can detect faults and diagnose control observations follow a normal distribution. VAE uses variational
them with greater proficiency (Burgas et al., 2018; Yin et al., 2012). inference to reduce high-dimensional data into low-dimensional data
Moreover, probabilistic PCA-based charts have been introduced to that follow a multivariate normal distribution. Therefore, the use of
provide a more natural expression for the process data. An expectation– VAE for process monitoring can address the limitation posed by the
maximization algorithm applied for parameter estimation in the prob- distributional assumption underlying the 𝑇 2 charts of many existing
abilistic PCA can greatly reduce the computation burden, particularly latent variable-based monitoring methods.

14
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 1. Illustration of the VAE architecture.

(3) To demonstrate the effectiveness and applicability of the pro- KL measures the difference between two probability distributions.
posed method, we use simulated and real data to compare the pro- The value of KL is zero when the two probability distributions are the
posed monitoring method with existing latent variable-based methods same. The KL part of a loss approximates the distribution of latent
in terms of false alarm rates and fault detection rates. The results variables to a multivariate normal distribution. Thus, the encoder of
demonstrate that our proposed method outperforms the alternatives. VAE 𝑞𝜙 (𝑧|𝑥) learns the latent variables that follow a multivariate
The remainder of the paper is organized as follows. Section 2 de- normal distribution over a k-dimensional latent space. In other words,
scribes the proposed VAE-based control chart by emphasizing its ability for given training data 𝒙, the encoder derives the mean vector 𝝁𝒛 =
to handle nonlinear and nonnormal processes. In Section 3, we conduct [𝜇𝑧1 , 𝜇𝑧2 , … , 𝜇𝑧𝑘 ]𝑇 and the following covariance matrix of the multivari-
the simulation study to examine the performance of the proposed VAE- ate normal distribution from which the latent variables are sampled:
based control chart under various high-dimensional scenarios. Section 4
presents the results of a case study that used real data from a thin-film- ⎡𝜎𝑧21 0 0 ⎤
⎢ 0 2 ⋯ ⎥
transistor liquid-crystal display (TFT-LCD) process. Section 5 provides 𝜎 0
𝜮𝒛 = ⎢ 𝑧2 ⎥, (9)
the concluding remarks. ⎢ ⋮ ⋱ ⋮ ⎥
⎢ 0 0 𝜎𝑧2 ⎥
⎣ ⋯
𝑘

2. Proposed methodology: Multivariate control chart based on the For this reason, dimensionality reduction based on the VAE can be
VAE suitable for 𝑇 2 process monitoring.
The encoder 𝑞𝜙 (𝒛|𝒙) consists of several fully connected layers with
As mentioned in Section 1, latent variable-based monitoring meth- multiple nonlinear functions to map the input data 𝒙 into a latent
ods use 𝑇 2 charts in the reduced space under an assumption that the variable 𝒛 that follows a multivariate normal distribution with mean
in-control observations follow a multivariate normal distribution. In vector 𝝁𝒛 and covariance matrix 𝜮 𝒛 :
this study, we propose the use of the VAE for monitoring methods to
𝑞𝜙 (𝒛|𝒙) =  (𝒛; 𝝁𝒛 , 𝜮 𝒛 ). (10)
address nonlinear and nonnormal processes.
(𝑙)
VAE is a popular generative model that combines Bayesian infer- We denote by 𝑾 and

𝑏(𝑙)
the weight vector connecting the 𝑙th

ence with deep neural networks and efficiently obtains a nonlinear hidden layer to hidden node ℎ in the (𝑙 + 1)th hidden layer and the bias
low-dimensional feature space (Kingma et al., 2014). As can be seen term of ℎ, respectively. We wish to learn 𝜇𝑧𝑗 and 𝜎𝑧𝑗 , for 𝑗 = 1, 2, … , 𝑘,
from Fig. 1, the VAE consists of two parts, i.e., the encoder 𝑞𝜙 (𝑧|𝑥) with the encoder part of VAE. The outputs of the encoder are calculated
and the decoder 𝑝𝜃 (𝑥|𝑧), which are multilayered neural networks pa- as follows:
rameterized by 𝜙 and 𝜃, respectively. Although the encoder aims to
𝜇𝑧𝑗 (𝒙) = 𝐖(L)
𝜇 𝒉
(L)
+ b(L)
𝜇 , (11)
reduce the dimensionality by mapping the original training data 𝒙 = zj zj

[𝑥1 , 𝑥2 , … , 𝑥𝑝 ]𝑇 into a lower dimensional vector 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑘 ]𝑇 ,


𝜎𝑧𝑗 (𝒙) = 𝑾 (𝐿)
𝜎 𝒉
(𝐿)
+ 𝑏(𝐿)
𝜎 , (12)
the decoder uses the latent variable 𝑧 to reconstruct the original data. 𝑧𝑗 𝑧𝑗

The loss function of VAE imposes restrictions on the distribution of the where 𝐿 is an index of the next-to-last layer of the encoder. 𝒉(𝑙) is the
latent variables (Kingma and Welling, 2014). This loss function consists 𝑙th hidden layer, where each of its hidden nodes ℎ(𝑙)
𝑗 is calculated by
of a reconstruction loss and of the Kullback–Leibler (KL) divergence
between the learned latent distribution and a prior distribution 𝑝(𝑧) = ℎ(𝑙) (𝑙−1) (𝑙−1)
𝑗 = 𝜋(𝑾 𝑗 𝒉 + 𝑏𝑗(𝑙−1) ), (13)
 (𝑧|0, 𝑰), where 𝑰 is the identity matrix:
where ℎ(𝑙)
𝑗 = 𝑥𝑗 , for 𝑙 = 2, … , 𝐿, and 𝜋 is the nonlinear activation
−E𝑞𝜙 (𝑧|𝑥) (log 𝑝𝜃 (𝑥|𝑧)) + KL( 𝑞𝜙 (𝑧|𝑥) ∥ 𝑝(𝑧)). (8) function of each layer such as sigmoid, hyperbolic tangent, and rectified

15
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 2. The decisional boundary and a corresponding control chart based on the VAE.

linear unit (ReLU) (Nair and Hinton, 2010). By encoding the latent generated 1000 observations from a 100-dimensional multivariate log-
vectors as mean 𝝁𝑧 and variance 𝝈 2𝑧 from training data 𝒙, we can draw normal distribution with a zero mean and a covariance matrix that has
samples 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑘 ]𝑇 from  (𝝁𝑧 , 𝜮 𝑧 ). These low-dimensional unit variances and 0.3 off-diagonal terms. For visualization, we reduced
data 𝑧 in the latent space of the VAE are monitored on a 𝑇 2 chart. Note the dimensions to 2 from 100. Fig. 2 illustrates the dimensionality
that we used the average value of multiple samples as 𝒛 to reflect the reduction results of the VAE with the corresponding control chart.
variability of 𝑞𝜙 (𝒛|𝒙) in monitoring statistics. The monitoring statistics Fig. 2(a) depicts the two-dimensional representation of the VAE and the
and control limits of the VAE-based 𝑇 2 chart are computed from the decision boundary formed by the 𝑇 2 chart when the expected Type I er-
following equations: ror rate is 0.05. As mentioned earlier in this section, the latent variables
( )𝑇 ( ) of the VAE are learned to follow a multivariate normal distribution.
𝑇𝑉2 𝐴𝐸 = 𝒛 − 𝒛 𝑺 −1 z 𝒛−𝒛 , (14)
Therefore, we expect the latent variables to follow a bivariate normal
𝑘(𝑛 + 1)(𝑛 − 1) distribution in the reduced two-dimensional space. In Fig. 2(a), the en-
𝐶𝐿𝑇 2 = 𝐹(𝛼,𝑘,𝑛−𝑘) , (15)
𝑉 𝐴𝐸 𝑛(𝑛 − 𝑘) coder outputs resemble a bivariate normal distribution, as we expected.
where 𝒛 and 𝑺 z are, respectively, the sample mean vector and the Consequently, the decision boundary of 𝑇 2 can properly capture the
covariance matrix determined from the reduced in-control data, n is training data in the reduced feature space. Fig. 2(b) illustrates both the
the number of in-control observations, and 𝛼 is the Type I error rate. 𝑇 2 and the SPE chart.
Note that the decoder network 𝑝𝜃 (𝒙|𝒛) =  (𝒙; 𝝁𝒙 , 𝜮 𝒙 ) maps the
latent vector 𝒛 to the reconstruction of 𝒙. The derivation is similar 3. Simulations
to the encoder network. For 𝑗 = 1, 2, … , 𝑝, mean 𝜇𝑥𝑗 and standard
deviation 𝜎𝑥𝑗 are calculated as 3.1. Simulation setup

(𝐿′ ) ′ (𝐿′ )
𝜇𝑥𝑗 (𝑧) = 𝑾 𝜇𝑥 𝒉(𝐿 ) + 𝑏𝜇𝑥 , (16) We conducted a simulation to examine the properties of the pro-
𝑗 𝑗
( )𝐿′ ′ ( ) 𝐿′ posed VAE-based chart and compared it with existing latent variable-
𝜎𝑥𝑗 (𝑧) = 𝑾 𝜎𝑥 𝒉(𝐿 ) + 𝑏𝜎𝑥 , (17) based charts, such as the PCA, KPCA, and AE under various situations.
𝑗 𝑗

where 𝐿′ is an index of the next-to-last layer of the decoder. The hidden We used Type I and Type II error rates and average run length (ARL)
nodes are defined as to evaluate the performance of the control charts. Simulations were
conducted for both normal and nonnormal situations. For nonnormal
(𝑙′ ) ′ ′ ′
ℎ𝑗 = 𝜋(𝑾 (𝑙𝑗 −1) 𝒉(𝑙 −1) + 𝑏(𝑙𝑗 −1) ), (18) cases, we used lognormal distributions to represent skewed processes,
t -distributions to represent symmetric and high kurtosis processes, and
for 𝑙′ = 2, … , 𝐿′ . In the decoder, the first layer is set as ℎ(1)
𝑗 = 𝑧𝑗 . This 𝒛
( ) gamma distributions to represent skewed and high kurtosis situations.
is randomly sampled from  𝒛; 𝝁𝒛 , 𝜮 𝒛 . On the training phase through Based on these four distributions, we generated 2000 in-control ob-
backpropagation, we are faced with a problem in which the random servations with 300 and 500 variables for Phase I analysis. Phase I
sampling part is nondifferentiable. We can use a reparameterization analysis attempts to isolate the in-control data from a historical data
trick that approximates z with another normal distribution 𝜺 ∼  (0, 𝑰) set and to establish the monitoring charts based on these isolated in-
to solve the problem. It enables the transformation of random variable control data (Sukchotrat et al., 2009). We used the zero mean vector

𝒛 into differentiable form 𝒛 = 𝝁𝒛 + 𝜮 𝒛 𝜺 (Doersch, 2016). 𝝁0 = [0, 0, … , 0]𝑇 and the covariance matrix 𝜮 0 . Let 𝜮 0 denote a
Because 𝑝𝜃 (𝒙|𝒛) produces a distribution of 𝒙 values given 𝒛, it covariance matrix with unit variances and an equal-sized block where
is referred to as a probabilistic decoder. We can draw samples from Cov(𝑿 𝒊 , 𝑿 𝒋 ) is 0.9 if 𝑿 𝒊 and 𝑿 𝒋 belong to the same block and 0.2
 (𝝁𝑥 , 𝜮 𝑥 ), and their expected value is equal to 𝝁𝒙 . Therefore, the otherwise. This covariance matrix that Thulin (2014) used is useful for
monitoring statistics and control limits of VAE-based SPE charts are modeling situations frequently found in manufacturing processes with
computed as follows: high correlations between adjacent variables. In our simulation study,
( )2 we set the block size as 5% of the original number of variables. Fig. 3
𝑆𝑃 𝐸𝑉 𝐴𝐸 = 𝒙 − 𝝁𝒙 (19)
𝑣 2 illustrates an example of a covariance matrix with a block size of two.
𝐶𝐿𝑆𝑃 𝐸𝑉 𝐴𝐸 = 𝜒 2 , (20) We also generated 2000 out-of-control observations with a noncen-
2𝑚 (2𝑚 ∕𝑣,𝛼)
trality parameter 𝜆 (Montgomery, 2009):
where 𝑚 and 𝑣 are, respectively, the sample mean and the sample

variance of 𝑆𝑃 𝐸𝑉 𝐴𝐸 from the in-control data. Algorithm 1 shows the
𝜆 = 𝜹𝑇 𝜮 −1 𝑜 𝜹 (21)
procedure of the proposed VAE-based monitoring method.
We investigated the representational ability of VAE to cause low- where 𝜹 is the size of the mean shift and 𝜮 0 is the estimated covariance
dimensional data to follow a multivariate normal distribution. We matrix. 𝜹 can be computed by subtracting the mean vector from the

16
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

shifted mean vector. We generated three types of mean shifts (𝜆 =


1, 2, 3). As seen in Table 1, we used 24 simulation scenarios. For each
scenario, 50 replications were conducted to determine the average
values of the actual Type I and Type II error rates.

3.2. Model architecture and hyperparameter selection

For the experiments, we constructed a VAE model with a simple


architecture consisting of five fully connected hidden layers. Both the
encoder and the decoder are composed of two hidden layers, and
there is a single hidden layer for the latent variable in the middle.
We used leaky ReLU as an activation function for the hidden layers
Fig. 3. Example of a covariance matrix (block size = 2).
of the encoder and decoder because it enables the efficient training of
deep neural networks (Maas et al., 2013). Fig. 4 shows the schematic
structure of a VAE in which p is the number of variables for the original
data and k is the number of extracted variables. We also constructed

17
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 4. Number of hidden layers and hidden nodes of the VAE network.

Table 1
Summary of the 24 simulation scenarios.
Types Mean shift (𝑝 = 300) Mean shift (𝑝 = 500)
Small (𝜆 = 1) Medium (𝜆 = 2) Large (𝜆 = 3) Small (𝜆 = 1) Medium (𝜆 = 2) Large (𝜆 = 3)
Normal distribution 𝑁𝑝=300 1 𝑁𝑝=300 2 𝑁𝑝=300 3 𝑁𝑝=500 1 𝑁𝑝=500 2 𝑁𝑝=500 3
Lognormal distribution 𝐿𝑝=300 1 𝐿𝑝=300 2 𝐿𝑝=300 3 𝐿𝑝=500 1 𝐿𝑝=500 2 𝐿𝑝=500 3
t -distribution 𝑇𝑝=300 1 𝑇𝑝=300 2 𝑇𝑝=300 3 𝑇𝑝=500 1 𝑇𝑝=500 2 𝑇𝑝=500 3
Gamma distribution 𝐺𝑝=300 1 𝐺𝑝=300 2 𝐺𝑝=300 3 𝐺𝑝=500 1 𝐺𝑝=500 2 𝐺𝑝=500 3

Table 2 Table 3
Number of extracted features determined by the PCA. The exact proportion of the total p-values of Royston’s multivariate normality test (𝛼 = 0.05).
variance is shown in parentheses. PCA KPCA AE VAE
Types Number of the extracted features (k)
𝑁𝑝=300 0.869 0.841 0.946 0.836
Normal distribution
𝑝 = 300 𝑝 = 500 𝑁𝑝=500 0.085 0.137 0.692 0.812
Normal distribution 16 (79.3%) 16 (79.5%) 𝐿𝑝=300 0.000 0.000 0.000 0.297
Lognormal distribution
Lognormal distribution 17 (80.2%) 17 (80.0%) 𝐿𝑝=500 0.000 0.000 0.000 0.265
t -Distribution 16 (79.4%) 16 (80.4%)
𝑇𝑝=300 0.000 0.000 0.000 0.827
Gamma distribution 13 (79.6%) 22 (81.2%) t -Distribution
𝑇𝑝=500 0.000 0.000 0.000 0.815
𝐺𝑝=300 0.000 0.000 0.000 0.467
Gamma distribution
𝐺𝑝=500 0.000 0.000 0.000 0.210
an AE model with the same hyperparameters of the number of hidden
layers, hidden nodes, and type of activation functions. For the KPCA
model, we used a Gaussian kernel and the kernel parameters were Bengio, 2010) and updated through 400 epochs with a 200-batch size.
estimated to minimize the difference between actual Type I error rates We used mean squared error for the loss function to be minimized for
and expected Type I error rates in normal training data. training the AE models.
To determine the latent dimension 𝑘, we applied the number of
extracted features by PCA equally to KPCA, AE, and VAE because
3.3. Simulation results
the PCA algorithm, unlike other algorithms, has a straightforward and
systematic way to determine the number of reduced dimensions. The
PCA has many methods available to choose the number of extracted We performed Royston’s multivariate normality test to verify that
features; most of them involve some assumptions about the variance of the latent variables of the VAE followed a multivariate normal distribu-
each component (Jolliffe, 1986). We can determine the number of PCs tion (Royston, 1982). This test investigated the null hypothesis that the
with the proportion of the total variance explained by each PC, which tested data came from a population of multivariate normal distribution.
is given by the corresponding eigenvalue of the covariance matrix for If the p-value was less than the predetermined significance level 𝛼, then
the process data (Johnson and Wichern, 2007). For example, in the the null hypothesis was rejected and we obtained an evidence that the
𝑁𝑝=300 case, the choice of the first 15 PCs was regarded as the best data were not normally distributed. We trained the PCA, KPCA, AE,
approach. These 15 PCs accounted for 80.5% of the total variance. and VAE models according to each simulation case shown in Table 2
In other words, it requires only 15 variables to cover around 80% of and then performed Royston’s tests on the latent variables. We used
the original information (i.e., variance). Table 2 shows the number of an 𝛼 value of 0.05, which is commonly used in many statistical tests.
extracted features determined by the criteria, around 80% proportion Table 3 shows the p-values of the tests. We confirmed that all of the
of the total variance, for each scenario. latent variables projected from the normal distribution data followed a
As for training the VAE and AE models, we used an Adam optimizer normal distribution. However, only the VAE models produced normally
with a learning rate of 0.001 because this optimizer yields quicker distributed latent variables in nonnormal cases.
convergence when training deep networks (Kingma and Ba, 2014). We investigated the performance of the proposed VAE-based control
The models were initialized by a Xavier normal initializer (Glorot and charts by comparing the expected and actual Type I error rates. We

18
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 5. Changes in average actual Type I error rates corresponding to different expected Type I error rates for PCA-, KPCA-, AE-, and VAE-based charts in different simulation
cases: (a) 𝑁𝑝=300 , (b) 𝑁𝑝=500 , (c) 𝐿𝑝=300 , (d) 𝐿𝑝=500 , (e) 𝑇𝑝=300 , (f) 𝑇𝑝=500 , (g) 𝐺𝑝=300 , and (h) 𝐺𝑝=500 .

19
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 6. Actual Type II error rates corresponding to the actual Type I error rates for PCA-, KPCA-, AE-, and VAE-based charts in normal distribution cases: (a) 𝑁𝑝=300 1, (b) 𝑁𝑝=300 2,
(c) 𝑁𝑝=300 3, (d) 𝑁𝑝=500 1, (e) 𝑁𝑝=500 2, and (f) 𝑁𝑝=500 3.

calculated the average value of the actual Type I error rates correspond- the t -distribution, the false alarm rates of 𝑇 2 charts were relatively
ing to the different expected Type I error rates (0.01–0.1); the result is low compared with the lognormal and gamma distributions. Because
shown in Fig. 5. The control chart that yielded a similar value for the the decision boundary of a 𝑇 2 chart provides an ellipsoid, it might be
expected and actual Type I error rates would be considered as the better suitable when data have roughly symmetric and elliptical distributions.
one. In other words, control charts whose two-dimensional points of the Therefore, despite its being the t -distribution with high kurtosis, it
expected and actual Type I error rates were plotted near the 45◦ line exhibited generally lower false alarm rates than those of the lognormal
should be preferred. Fig. 5 confirms that the VAE-based 𝑇 2 charts were and gamma distributions with skewness properties.
closer to the 45◦ line than the other charts. Note that some lines for We can also examine the performance of the control charts by
the PCA- and AE-based charts of Fig. 5 are occluded because of similar comparing the actual false alarm rates and the actual misdetection
values. This result demonstrates that the proposed VAE-based chart is rates. A preferred chart will be the one that yields a lower misdetection
more stable than the traditional latent variable-based control charts in rate, given a similar actual false alarm rate. Because all charts have
terms of the number of false alarms. Among the nonnormal cases, in different ranges of actual Type I error rates, these charts are not

20
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 7. Actual Type II error rates corresponding to the actual Type I error rates for PCA, KPCA, AE, and VAE-based charts in lognormal distribution cases: (a) 𝐿𝑝=300 1, (b) 𝐿𝑝=300 2,
(c) 𝐿𝑝=300 3, (d) 𝐿𝑝=500 1, (e) 𝐿𝑝=500 2, and (f) 𝐿𝑝=500 3.

described with an equivalent range of the x-axis in Figs. 6–9. In Figs. 6– this superior performance can be attributed to the fact that the latent
9, each figure consists of a three-noncentrality parameter mean shift variables of the VAE were learned as a multivariate normal distribution.
size (𝜆 = 1, 2, 3), as shown, respectively, in subfigures (a)–(c) when As another quantity for comparison, ARL is commonly used when
𝑝 = 300 and in subfigures (d)–(f) when 𝑝 = 500. All simulation comparing the performance of different control charts. There are two
scenarios returned similar results in that the VAE-based charts yielded types of ARL: in-control ARL (ARL0 ) and out-of-control ARL (ARL1 ).
lower misdetection rates than those of the other charts, given similar ARL0 is the expected number of observations from the beginning of a
false alarm rates. That is, the VAE-based charts consistently yielded process to the first alarm occurrence when the process is in control. By
the best results. The difference between the proposed VAE-based chart contrast, ARL1 is the expected number of observations from the first
and the existing latent variable-based charts was clearly noticed in appearance of an actual out-of-control observation to the time of the
situations with large mean shifts compared with small mean shifts. The first detection. Usually, the value of ARL0 is prespecified and charts
performance of the VAE-based chart was especially superior to those of that have a lower ARL1 value are considered superior.
existing latent variable-based charts when the original data sets were Similar to the previous comparisons, we conducted 24 simulation
generated from a nonnormal distribution. We expected that the VAE- cases, as illustrated in Table 1. Table 4 shows the ARL1 of the existing
based 𝑇 2 chart would perform best in these scenarios. We believe that latent variable-based control charts and of the proposed VAE-based

21
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 8. Actual Type II error rates corresponding to the actual Type I error rates for PCA-, KPCA-, AE-, and VAE-based charts in t -distribution cases: (a) 𝑇𝑝=300 1, (b) 𝑇𝑝=300 2, (c)
𝑇𝑝=300 3, (d) 𝑇𝑝=500 1, (e) 𝑇𝑝=500 2, and (f) 𝑇𝑝=500 3.

control charts, given that the desired ARL0 = 370. The values in etching, stripping, and inspection. Each process is repeated several
parentheses indicate standard deviations. times to create a multilayered product. Fig. 10 illustrates an overview
As shown in Table 4, the proposed VAE-based chart had lower ARL1 of the TFT-LCD creation process.
than those of the other charts. This means that the VAE-based charts The plasma-enhanced chemical vapor deposition (PECVD) process is
were quicker to detect out-of-control observations than the existing one of the most popular deposition methods. It creates silicon nitride,
latent variable-based charts. amorphous silicon, and n+silicon films that are used for semiconductors
and insulator films. This process is perceived as important because this
4. Case study: TFT-LCD manufacturing process step determines the performance of the transistor’s components. Of the
steps in the PECVD process, we focused on the silicon nitride step,
4.1. Description of the TFT-LCD manufacturing process which produces a gate insulator layer designed to prevent short circuits.
This step involves a complex procedure. In a process in which a thin-
The manufacture of a TFT-LCD includes a sequence of processes: film material is converted from a gaseous to a solid state, a thin film
deposition, cleaning, photo resistance coating, exposure, developing, is deposited on the substrate. Some chemical reactions are associated

22
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 9. Actual Type II error rates corresponding to the actual Type I error rates for PCA-, KPCA-, AE-, and VAE-based charts in gamma distribution cases: (a) 𝐺𝑝=300 1, (b) 𝐺𝑝=300 2,
(c) 𝐺𝑝=300 3, (d) 𝐺𝑝=500 1, (e) 𝐺𝑝=500 2, and (f) 𝐺𝑝=500 3.

with this process after the reaction gas is converted into plasma. This 4.2. Experimental results of the case study
conversion occurs when a radiofrequency discharge ignites the gas in
For this experiment, we used 2000 observations characterized by
a space between two electrodes. The quality of the silicon nitride film
42 quality characteristics from the PECVD process to compare the
may be affected by such factors as chamber pressure, gas flow rates, performance of the VAE-based chart and existing latent variable-based
temperature, and radiofrequency power. These quality characteristics charts. We used 1000 in-control observations of the process for the
not only affect the quality of the final product but also can cause Phase I analysis. Two thousand observations (1000 in-control and 1000
out-of-control) were used to evaluate the control charts in terms of Type
failure of the equipment. Therefore, an efficient and reliable monitor-
I and Type II error rates. Because of confidentiality agreements with the
ing tool is required to detect any fluctuations that occur during this source of this data set, we cannot disclose the names of the variables
process. and their explanations in this paper.

23
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Table 4
Average value of ARL1 with PCA-, KPCA-, AE-, and VAE-based charts for different scenarios given that the actual ARL0 = 370. Average values
of the standard deviations are shown in parentheses.
PCA-based chart KPCA-based chart AE-based chart VAE-based chart
𝑁𝑝=300 1 116.6 (31.5) 79.8 (28.0) 117.2 (28.9) 48.9 (12.4)
𝑁𝑝=300 2 37.1 (7.4) 26.1 (8.1) 34.9 (5.6) 14.0 (2.9)
𝑁𝑝=300 3 9.1 (1.9) 8.2 (2.1) 8.5 (1.3) 4.5 (0.8)
Normal distribution
𝑁𝑝=500 1 88.6 (18.4) 75.8 (14.5) 94.9 (15.3) 50.9 (28.4)
𝑁𝑝=500 2 36.1 (6.0) 30.7 (10.5) 35.4 (8.2) 19.4 (8.9)
𝑁𝑝=500 3 8.5 (1.4) 9.3 (2.9) 8.6 (2.2) 6.1 (1.9)
𝐿𝑝=300 1 92.9 (55.3) 95.2 (53.1) 81.1 (33.1) 27.0 (4.1)
𝐿𝑝=300 2 28.9 (11.6) 31.9 (15.3) 26.4 (7.2) 10.7 (1.0)
𝐿𝑝=300 3 13.0 (3.8) 14.3 (5.0) 12.6 (3.0) 5.4 (0.6)
Lognormal distribution
𝐿𝑝=500 1 78.5 (22.7) 85.6 (24.1) 78.3 (23.9) 19.6 (1.5)
𝐿𝑝=500 2 30.0 (7.7) 31.8 (8.0) 27.5 (6.5) 8.0 (0.5)
𝐿𝑝=500 3 14.5 (2.8) 15.9 (3.3) 13.4 (2.3) 4.4 (0.1)
𝑇𝑝=300 1 301.9 (101.3) 353.1 (157.5) 292.9 (110.2) 80.3 (22.5)
𝑇𝑝=300 2 242.3 (122.9) 283.9 (176.1) 188.3 (96.5) 68.2 (23.1)
𝑇𝑝=300 3 120.5 (27.4) 251.3 (180.2) 95.9 (37.4) 34.9 (8.0)
t -distribution
𝑇𝑝=500 1 383.5 (194.0) 370.5 (265.4) 317.2 (170.1) 103.6 (48.5)
𝑇𝑝=500 2 183.7 (57.2) 198.9 (94.3) 157.9 (49.5) 82.7 (25.4)
𝑇𝑝=500 3 135.5 (23.8) 221.8 (132.9) 107.2 (23.8) 57.5 (26.1)
𝐺𝑝=300 1 225.2 (29.8) 244.2 (43.9) 236.2 (30.2) 88.9 (26.6)
𝐺𝑝=300 2 106.9 (17.2) 123.8 (26.9) 110.8 (20.9) 48.9 (9.1)
𝐺𝑝=300 3 44.3 (5.6) 59.3 (10.9) 45.5 (6.4) 22.2 (4.9)
Gamma distribution
𝐺𝑝=500 1 306.3 (81.6) 348.9 (102.5) 296.1 (105.8) 102.6 (22.1)
𝐺𝑝=500 2 135.7 (13.4) 223.9 (36.2) 136.5 (35.3) 62.0 (30.2)
𝐺𝑝=500 3 65.2 (11.3) 152.9 (46.8) 64.0 (12.2) 25.3 (5.8)

considered the 𝑇 2 chart and the SPE chart simultaneously. However,


as clearly shown in Fig. 11, the 𝑇 2 statistic of the VAE-based charts,
both VAE(1) and VAE(2), outperformed those of the other charts in
detecting real faults.
Fig. 12(a) indicates the performance of the 𝑇 2 and SPE charts
in terms of the actual false alarm rates from the PCA-, KPCA-, AE-
, and VAE-based charts over a range of expected false alarm rates.
Similar expected and actual type I error rates are desirable in a control
chart. Results from the proposed VAE-based chart had more similarity
between the actual and the expected Type I error rates than those
of the existing latent variable-based charts. This result confirms that
the proposed VAE-based chart appropriately uses the 𝑇 2 statistic in
the reduced space. Fig. 12(b) shows the actual Type II error rates
corresponding to the actual Type I error rates of the existing methods
and the VAE-based charts. Fig. 12(b) shows that the proposed chart is
superior to the existing latent variable-based charts in terms of actual
misdetection rates. In addition, it can be seen that the VAE(2)-based
chart showed slightly better performance than that of the VAE(1)-based
chart, although the performance difference was not so large.
In addition to actual Type I and Type II error rates, we compared
the charts in terms of ARL. We calculated the ARL1 of the existing latent
variable-based charts and of the proposed VAE-based chart, given that
the actual ARL0 = 370. The control chart with the smallest actual ARL1
Fig. 10. Overview of the TFT-LCD process. was considered the best. The ARLs determined for the PCA-, KPCA-,
AE-, VAE(1)- and VAE(2)-based charts were 1.45, 1.61, 1.49, 1.21, and
1.20, respectively, which indicates that the proposed VAE-based chart
Note that all experimental settings were equal to the simulation could detect out-of-control observations earlier than the existing charts.
settings. For usage of the proposed VAE-based chart in a real-world
setting, we constructed an extra VAE model consisting of three fully 5. Conclusions
connected hidden layers. Compared with the VAE model of the sim-
ulation experiments, there was only one hidden layer in both the This study is the first attempt to use VAE for multivariate process
encoder and the decoder. The number of dimensions of the hidden monitoring in high-dimensional nonlinear and nonnormal situations.
layers was half the size of the input variable. We denoted by VAE(1) VAE networks have been used primarily for generative models in deep
and VAE(2) the VAE models with three hidden layers and five hidden learning fields. However, they have not yet been applied in the context
layers, respectively. of process monitoring. Although the artificial neural network approach
The monitoring results of the PCA-, KPCA-, AE-, and VAE-based has been used in advanced process monitoring, it has not addressed the
charts when the actual type I error rate was 0.05 are shown in explicit assumption that the data are distributed normally in a projected
Fig. 11(a–e), respectively. As mentioned earlier, the faults in the space. Use of the proposed VAE chart is expected to reduce both
process occurred between the 1001st and the 2000th observations, and unwanted false alarms and misdetections in process control because
the actual faults are plotted in red. To evaluate the performance, we it can promptly reduce the original data to a multivariate normal

24
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 11. Comparisons of the monitoring statistics from the (a) PCA-based chart, (b) KPCA-based chart, (c) AE-based chart, (d) VAE(1)-based chart, and (e) VAE(2)-based chart
with TFT-LCD process data. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

distribution. The results of the simulations and of the case study from Acknowledgments
the TFT-LCD manufacturing process show that the proposed VAE-based
chart outperforms the existing latent variable-based charts. The authors would like to thank the editor and reviewers for their
useful comments and suggestions, which were greatly help in improv-
Although the current study focuses on process monitoring to detect
ing the quality of the paper. This research was supported by Brain Korea
faults, this idea can be extended to the interpretation and isolation of
PLUS, Basic Science Research Program through the National Research
out-of-control variables in multivariate problems. We believe that the Foundation of Korea funded by the Ministry of Science, ICT and Fu-
proposed methods can be a cornerstone for using the VAE algorithm ture Planning, South Korea (NRF-2016R1A2B1008994), the Ministry
in process monitoring and may be profitably paired in many process of Trade, Industry & Energy, South Korea under Industrial Technol-
monitoring situations in manufacturing, such as multimodality and ogy Innovation Program (R1623371), and by Institute for Information
time-varying characteristics. &communications Technology Promotion grant funded by the Korea

25
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Fig. 12. Comparisons between the existing latent variable-based charts and the VAE-based chart with TFT-LCD process data: (a) actual Type I error rates corresponding to the
expected Type I error rates of the 𝑇 2 and SPE charts, and (b) actual Type II error rates corresponding to the actual Type I error rates.

government (No. 2018-0-00440, ICT-based Crime Risk Prediction and Johnson, R.A., Wichern, D.W., 2007. Applied Multivariate Statistical Analysis. Pearson
Response Platform Development for Early Awareness of Risk Situation). Education International.
Jolliffe, I.T., 1986. Principal component analysis and factor analysis. In: Principal
Component Analysis. Springer, New York, pp. 115–128.
References Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Aguado, D., Rosen, C., 2008. Multivariate statistical monitoring of continuous Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M., 2014. Semi-supervised learning
wastewater treatment plants. Eng. Appl. Artif. Intell. 21 (7), 1080–1091. with deep generative models. Adv. Neural Inf. Process. Syst. 3581–3589.
An, J., Cho, S., 2015. Variational Autoencoder Based Anomaly Detection using Kingma, D.P., Welling, M., 2014. Auto-encoding variational bayes. arXiv preprint
Reconstruction Probability. SNU Data Min. Cent. Tech. Rep. arXiv:1312.6114.
Botía, J.F., Isaza, C., Kempowsky, T., Le Lann, M.V., Aguilar-Martín, J., 2013. Automa- Kourti, T., 2005. Application of latent variable methods to process control and
ton based on fuzzy clustering methods for monitoring industrial processes. Eng. multivariate statistical process control in industry. Int. J. Adapt. Control. Signal.
Appl. Artif. Intell. 26 (4), 1211–1220. Process. 19, 213–246.
Box, G.E., 1954. Some theorems on quadratic forms applied in the study of analysis of Kourti, T., MacGregor, J.F., 1995. Process analysis, monitoring and diagnosis, using
variance problems: I. Effect of inequality of variance in the one-way classification. multivariate projection methods. Chem. Intell. Lab. Syst. 28, 3–21.
Ann. Math. Stat. 25, 290–302. Ku, W., Storer, R.H., Georgakis, C., 1995. Disturbance detection and isolation by
Burgas, L., Melendez, J., Colomer, J., Massana, J., Pous, C., 2018. N-dimensional dynamic principal component analysis. Chem. Intell. Lab. Syst. 30, 179–196.
extension of unfold-PCA for granular systems monitoring. Eng. Appl. Artif. Intell. Lee, J.M., Yoo, C., Choi, S.W., Vanrolleghem, P.A., Lee, I.B., 2004. Nonlinear pro-
71, 113–124. cess monitoring using kernel principal component analysis. Chem. Eng. Sci. 59,
Cheng, C., Chiu, M.S., 2005. Nonlinear process monitoring using JITL-PCA. Chem. Intell. 223–234.
Lab. Syst. 76, 1–13. Li, N., Yang, Y., 2014. Ensemble kernel principal component analysis for improved
Choi, S.W., Lee, C., Lee, J.M., Park, J.H., Lee, I.B., 2005. Fault detection and nonlinear process monitoring. Ind. Eng. Chem. Res. 54, 318–329.
identification of nonlinear processes based on kernel PCA. Chem. Intell. Lab. Syst. Lowry, C.A., Montgomery, D.C., 1995. A review of multivariate control charts. IIE
75, 55–67. Trans. 27, 800–810.
Daszykowski, M., Walczak, B., Massart, D.L., 2003. A journey into low-dimensional Lu, X.S., 1998. Control chart for multivariate attribute processes. Int. J. Prod. Res. 36,
spaces with autoassociative neural networks. Talanta 59, 1095–1105. 3477–3489.
Doersch, C., 2016. Tutorial on variational autoencoders. arXiv preprint arXiv:1606. Lv, F., Wen, C., Bao, Z., Liu, M., 2016. Fault diagnosis based on deep learning. Am.
05908. Control. Conf. IEEE 685, 1–6856.
Dong, D., McAvoy, T.J., 1996. Nonlinear principal component analysis-based on Maas, A.L., Hannun, A.Y., Ng, A.Y., 2013. Rectifier nonlinearities improve neural
principal curves and neural networks. Comput. Chem. Eng. 20, 65–78. network acoustic models. In: Proc. 30th Int. Conf. Mach. Learn. vol. 30, p. 3.
Ge, Z., 2017. Review on data-driven modeling and monitoring for plant-wide industrial Mason, R.L., Young, J.C., 2002. Multivariate statistical process control with industrial
processes. Chem. Intell. Lab. Syst. 171, 16–25. applications. Soc. Ind. Appl. Math.
Ge, Z., 2018. Process data analytics via probabilistic latent variable models: A tutorial Mastrangelo, C.M., Runger, G.C., Montgomery, D.C., 1996. Statistical process
review. Ind. Eng. Chem. Res. 57, 12646–12661. monitoring with principal components. Qual. Reliab. Eng. Int. 12, 203–210.
Ge, Z., Song, Z., Gao, F., 2013. Review of recent research on data-based process Maulud, A., Wang, D., Romagnoli, J.A., 2006. A multi-scale orthogonal nonlinear
monitoring. Ind. Eng. Chem. Res. 52, 3543–3562. strategy for multi-variate statistical process monitoring. J. Process Control 16,
Ge, Z., Yang, C., Song, Z., 2009. Improved kernel PCA-based monitoring approach for 671–683.
nonlinear processes. Chem. Eng. Sci. 64, 2245–2255. Montgomery, D.C., 2009. Introduction to Statistical Quality Control. John Wiley & Sons,
Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward New York.
neural networks. In: Proc. 13th Int. Conf. Artif. Intell. Stat. 249–256. Proc. 30th Nair, V., Hinton, G.E., 2010. Rectified linear units improve restricted Boltzmann
Int. Conf. Mach. Learn. vol. 30, p. 3. machines. In: Proc. 27th Int. Conf. Mach. Learn. pp. 807–814.
He, Q.P., Wang, J., 2017. Statistical process monitoring as a big data analytics tool for Phaladiganon, P., Kim, S.B., Chen, V.C., Jiang, W., 2013. Principal component analysis-
smart manufacturing. J. Process Control 67, 35–43. based control charts for multivariate nonnormal distributions. Exp. Syst. Appl. 40,
Hotelling, H., 1947. Multivariate quality control. Tech. Stat. Anal. 11, 1–184. 3044–3054.
Hu, Y., Palmé, T., Fink, O., 2017. Fault detection based on signal reconstruction with Pham, H., 2001. Recent Advances in Reliability and Quality Engineering, second ed.
auto-associative extreme learning machines. Eng. Appl. Artif. Intell. 57, 105–117. World Scientific.
Jackson, J.E., 1959. Quality control methods for several related variables. Royston, J.P., 1982. An extension of shapiro and wilk’s w test for normality to large
Technometrics 1, 359–377. samples. Appl. Stat. 31, 115–124.
Jackson, J.E., Mudholkar, G.S., 1979. Control procedures for residuals associated with Seborg, D.E., Mellichamp, D.A., Edgar, T.F., Doyle, III, F.J., 2010. Process Dynamics
principal component analysis. Technometrics 21, 341–349. and Control. John Wiley & Sons, New Jersey.

26
S. Lee, M. Kwak, K.-L. Tsui et al. Engineering Applications of Artificial Intelligence 83 (2019) 13–27

Shaheryar, A., Yin, X.C., Hao, H.W., Ali, H., Iqbal, K., 2016. A denoising based Wise, B.M., Gallagher, N.B., 1996. The process chemometrics approach to process
autoassociative model for robust sensor monitoring in nuclear power plants. Sci. monitoring and fault detection. J. Process Control 6, 329–348.
Technol. Nucl. Ins. Woodall, W.H., Hoerl, R.W., Palm, A.C., Wheeler, D.J., 2000. Controversies and
Shlens, J., 2014. A tutorial on principal component analysis. arXiv preprint arXiv: contradictions in statistical process control/discussion/response. J. Qual. Technol.
1404.1100. 32, 341–377.
Suh, S., Chae, D.H., Kang, H.G., Choi, S., 2016. Echo-state conditional variational Yao, L., Ge, Z., 2018. Deep learning of semisupervised process data with hierarchical
autoencoder for anomaly detection. Neural Netw. Int. Jt. Conf. IEEE 101, 5–1022. extreme learning machine and soft sensor application. IEEE Trans. Ind. Electron
Sukchotrat, T., Kim, S.B., Tsung, F., 2009. One-class classification-based control charts 65, 1490–1498.
for multivariate process monitoring. IIE Trans. 42, 107–120. Yin, S., Ding, S.X., Haghani, A., Hao, H., Zhang, P., 2012. A comparison study of
Sun, Q., Ge, Z., 2018. Probabilistic sequential network for deep learning of complex basic data-driven fault diagnosis and process monitoring methods on the benchmark
process data and soft sensor application. IEEE Trans. Ind. Inform. Tennessee Eastman process. J. Process Control 22, 1567–1581.
Thulin, M., 2014. A high-dimensional two-sample test for the mean using random Zhang, N., Tian, X.M., Cai, L.F., 2013. Nonlinear dynamic fault diagnosis method based
subspaces. Comput. Stat. Data Anal. 74, 26–38. on DAutoencoder. In: Meas. Technol. Mech. Autom. 2013 Fifth Int. Conf. IEEE, pp.
Van Sprang, E.N., Ramaker, H.J., Westerhuis, J.A., Gurden, S.P., Smilde, A.K., 2002. 729–732.
Critical evaluation of approaches for on-line batch process monitoring. Chem. Eng. Zhao, S., Xu, Y., 2005. Multivariate statistical process monitoring using robust nonlinear
Sci. 57 (18), 3979–3991. principal component analysis. Tsinghua Sci. Technol. 10, 582–586.

27

You might also like