Download as pdf or txt
Download as pdf or txt
You are on page 1of 11



Multisubspace Orthogonal Canonical

Correlation Analysis for Quality-Related
Plant-Wide Process Monitoring
Bing Song , Hongbo Shi , Shuai Tan , and Yang Tao

Abstract—Plant-wide processes often have the charac- [1]–[3]. However, most MSPM methods always have certain
teristics of large-scale and multiple operating units. More- requirements for data distribution and ignore whether the fault
over, due to the closed-loop control, it is possible that affects product quality or not, making them difficult to apply to
the fault never affects product quality. In this article, a
novel data-driven method called multisubspace orthogonal modern plant-wide processes.
canonical correlation analysis (CCA) is proposed, which For improving the economic efficiency, the plant-wide pro-
can not only tell whether the fault occurs but can also judge cess generally has characteristics such as large-scale and mul-
whether the fault affects the product quality in real time. tiple operating units. The correlation between collected data
First, to reduce process analysis complexity and to con-
is complex and data distribution characteristics are various.
struct an accurate monitoring model, the original process
variable space is divided into four subspaces. Second, the For plant-wide process monitoring, the most commonly used
developed orthogonal CCA is conducted on process data methods are distributed, multiblock, or multisubspace process
and quality data for correlation feature extraction. Then, monitoring, where the core idea is to divide the entire plant-wide
the quality-related and quality-unrelated features are ob- process into different subspaces according to certain standards
tained. Afterward, a total of six monitoring statistics are
and then monitor each subspace. In order to minimize the
constructed and integrated to four statistics with physical
interpretation via the Bayesian fusion strategy. Finally, the influence of different data characteristics, the modeling process
developed method is tested under an industrial case. needs to find the set of variables with similar data characteristics
as the dataset in each subspace. Subspace division can be con-
Index Terms—Data-driven, plant-wide, process monitor-
ing, quality related, real-time. ducted using process knowledge, expert system, statistical data
analysis, and engineering experience [4]. It can benefit the fault
I. INTRODUCTION detection earlier and reduce the complexity of fault diagnosis
[5]. At present, subspace partition through variable selection has
ROCESS monitoring can achieve the purpose of safe pro-
P duction, improving quality, reducing costs, saving energy,
and reducing pollution by automatically adjusting and manipu-
been widely used in the field of MSPM. Ge [6] reviewed pro-
cess decomposition through variable selection in the plant-wide
process and summarized opportunities and challenges in this
lating the variables that affect process conditions. With the wide
area. The effect of variable selection in process monitoring has
application of sensor technology, real-time storage technology,
been analyzed in detail [7]. Meanwhile, the closed-loop control
and information management system, a large amount of indus-
makes it possible that the fault never affects product quality, and
trial process data is available. Multivariate statistical process
most plant-wide process monitoring methods are unsupervised
monitoring (MSPM) methods that are typical representatives
without considering whether the fault has an impact on product
of data-driven process monitoring methods are widely studied
The closed-loop control always exists in modern plant-wide
Manuscript received February 24, 2020; revised May 12, 2020 and
June 25, 2020; accepted July 26, 2020. Date of publication August processes to compensate faults. Hence, not every fault would
7, 2020; date of current version June 16, 2021. This work was sup- cause deterioration of product quality. According to whether
ported in part by the National Natural Science Foundation of China it affects product quality, the fault can be divided into three
under Grant 61673173, Grant 61703161, Grant 61673178, and Grant
61673177, in part by the National Natural Science Foundation of Shang- categories: Quality related, quality semirelated, and quality
hai under Grant 19ZR1473200 and Grant 17ZR1444700, and in part by unrelated [8]. From a cost-effective perspective, there may be
Fundamental Research Funds for the Central Universities under Grant no need to spend a lot of manpower and resources to take
222201717006. Paper no. TII-20-0973. (Corresponding author: Hongbo
Shi.) immediate action when the quality-unrelated fault happens.
The authors are with the Key Laboratory of Advanced Control and In order to judge whether the fault influences product quality
Optimization for Chemical Processes of the Ministry of Education, or not, quality-related process monitoring methods have been
School of Information Science and Engineering, East China Univer-
sity of Science and Technology, Shanghai 200237, China (e-mail: extensively studied [9]–[12]. In these methods, by analyzing;; and calculating the feature matrix which can characterize the
cn; relationship between the process variables and quality variables,
Color versions of one or more of the figures in this article are available
online at process variables space is projected to quality-related subspace
Digital Object Identifier 10.1109/TII.2020.3015034 and quality-unrelated subspace. Partial least square (PLS) is a
1551-3203 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See for more information.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

backbone quality-related process monitoring method that en- for plant-wide process fault detection, and a genetic algorithm
ables regression modeling with the existence of multiple corre- regularized CCA was proposed [28]. In summary, the CCA
lations of independent variables. Given that some components method has been used for quality-related process monitoring
obtained by PLS would be orthogonal to outputs and the large and plant-wide process monitoring separately. To the best of our
variability is contained in the residual space, the total projection knowledge, the CCA-based quality-related plant-wide process
to latent structures (TPLS) algorithm, which is a postprocessing monitoring has not been studied. Moreover, various units of
modification of PLS, was proposed [13]. Afterward, TPLS was the plant-wide process may have different cumulative effects
extended to its dynamic version—dynamic TPLS [14]. However, on product quality, and big data may bring about a compu-
the false alarm rates of TPLS increase when the amplitude of the tational issue. Compared with the ordinary process, quality-
fault increases. To solve this problem, the quality-related process related process monitoring for a plant-wide process is more
monitoring method which combines the orthogonal signal cor- challenging.
rection and the modified PLS was proposed [15]. Considering In order to judge whether the fault affects product quality
that the principal component analysis (PCA) used in TPLS under the closed-loop control of the plant-wide process, this
to decompose the residual space is an unsupervised method work proposes a novel data-driven process monitoring method
and the residual space cannot be decomposed according to the named multisubspace orthogonal CCA (MOCCA). First, for
relationship with the quality, an efficient quality-related process constructing an accurate quality-related monitoring model, pro-
monitoring method called the improved PLS was presented cess variables are partitioned into quality-related variables and
[16]. In addition, to monitor both input variables and output quality-unrelated variables according to the relevance between
variables, the concurrent PLS with five monitoring statistics was process variables and quality variables. Besides, for reducing
developed [17]. For avoiding calculation iteration, the efficient the complexity and facilitating the determination of the control
projection to latent structures (EPLS) was developed, where limits at the same time, quality-related process variables and
the original space was projected to quality-related subspace, quality-unrelated process variables are further divided according
quality-unrelated subspace, and the residual subspace [18]. to whether the variable follows Gaussian distribution. As a re-
Similar to the PLS-based methods, principal component re- sult, process variables are partitioned into quality-related Gaus-
gression (PCR) is also used for quality-related process monitor- sian variables (QRGV), quality-unrelated Gaussian variables
ing. To employ labeled data and unlabeled data at the same time (QUGV), quality-related non-Gaussian variables (QRNV), and
for a more accurate estimation of the input data distribution, the quality-unrelated non-Gaussian variables (QUNV). Through
semisupervised PCR model was established [19]. Given that the subspace division, the plant-wide process is effectively de-
process characteristic would change with catalyst deactivation, composed into subsystems with meaningful information and a
production unit aging, and so on, for solving the nonlinear lower dimension. Second, the quality-related feature is extracted
and time-variant problems simultaneously, the locally weighted from QRGV and QRNV via the developed orthogonal CCA
kernel PCR method which uses the just-in-time learning strategy (OCCA) method, which decomposes both QRGV and QRNV
was proposed [20]. into quality-related information and unrelated information. The
In contrast to PLS which extracts maximum covariance be- quality-related information is employed to establish quality-
tween two datasets, canonical correlation analysis (CCA) ex- related monitoring statistics. Both the quality-unrelated informa-
tracts maximum multidimensional correlation from two datasets tion and QUGV, QUNV are used to establish quality-unrelated
[21]. In [22], CCA requires both input and output and requires monitoring statistics. As a result, a total of six monitoring statis-
both to be measurable online. Then a canonical correlation-based tics with a meaningful interpretation are constructed. Then the
residual was defined in the data-driven fashion for fault detec- same type of statistic is integrated through the Bayesian fusion
tion. To deal with the data with non-Gaussian distribution, a strategy to obtain comprehensive monitoring results. Differing
novel extension of CCA with the threshold setting on the basis from traditional MSPM methods, the proposed MOCCA method
of the randomized algorithm was presented and used to the can give a more meaningful process condition.
simulated traction drive control system [23]. To improve the The contributions of this article are listed as follows. 1) A
fault detection performance of incipient multiplicative faults, novel MOCCA method which can judge whether the fault occurs
a novel method that incorporates the CCA-based method and and whether the fault affects product quality in real time is
the statistical local idea was proposed, where two statistics developed for quality-related plant-wide process monitoring. 2)
for changes in input variables and output variables are used A novel subspace division method considering the relevance
[24]. Considering that data variance is neglected in CCA and with product quality and data distribution is proposed, where
collinearity problems exist in the process, the concurrent CCA both the supervised partition method and the unsupervised par-
method with regularization was proposed, where the variance tition method are used. 3) Through the subspace division, the
and covariance information are used efficiently [25]. For the dimension of each subspace is greatly reduced for the plant-wide
problem of uneven sampling of process variables and quality process. The quality-related monitoring model is established
variables, the concurrent dynamic CCA modeling method was based on quality-related variables and the model is more accu-
proposed [26]. Variational Bayesian mixture of CCA method rate. 4) The OCCA method is proposed, which can solve the real-
which uses Student’s t-distribution instead of Gaussian distri- time quality-related monitoring and quality-related information
bution to improve the robustness was developed to predict and extraction problems in the traditional CCA-based quality-related
diagnose product quality [27]. In addition, CCA is also used process monitoring method.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

II. PRELIMINARIES be solved when it is used for quality-related plant-wide process

monitoring. First, differing from the PCA-based method which
A. Canonical Correlation Analysis
depends on latent variables, the CCA-based method depends on
Suppose X ∈ n×mx is a dataset of process variables and the canonical correlation residual in (6). Since quality variables
Y ∈ n×my is a dataset of quality variables. are not easy to collect in real time, the canonical correlation
First, both X and Y are centered by mean, then residual cannot be computed in real time in the online monitoring
     phase. Similar to the monitoring statistic in (7), the monitoring
XY ≈ . (1) statistic in (9) is also difficult to construct in real time. Second,
although the monitoring statistic in (8) can detect the fault
The objective of CCA is to obtain canonical correlation vec- in process variables and can be calculated in real time, the
tors J i and Gi through the following equation [27]: canonical correlation component J T x(i) may contain variations
 orthogonal to quality variables Y . Hence, quality-related infor-
J Ti XY Gi
arg max   1/2  T  1/2 . (2) mation and quality-unrelated information cannot be effectively
J Ti X J i Gi Y Gi extracted and the monitoring statistic in (8) cannot represent

The solution to (2) is computed via defining the matrix as quality related or quality unrelated. Third, the distribution of
 −1/2  −1/2 plant-wide data is complex. Some variables follow Gaussian
    distribution, while others follow various types of non-Gaussian
= . (3) distribution. If a process variable follows Gaussian distribution,
 the control limit of its monitoring statistic in (8) can be decided
Conduct singular value decomposition (SVD) on as according to the distribution estimation method. Otherwise,
 T another approach needs to be implemented for the control limit
Λ (4)
where L ∈ mx ×mx is the unitary matrix, Λ =
A. Subspace Partition
diag(λ1 , λ2 , . . . , λh ) is the diagonal matrix, λ1 ≥ λ2 ≥
· · · ≥ λh > 0, h = rank( XY ) are the singular values, and When establishing process monitoring models, in order to
M ∈ my ×my is the unitary matrix. minimize the missed detection rate that occurs due to the lack
Two canonical correlation matrices are computed as of important process variables, as many process variables as
possible are usually collected together as monitored variables.
J = [J 1 , J 2 , . . . , J h ]
However, the downside is the risk of differences in data char-
 −1/2 acteristics, which may reduce the accuracy of the monitoring

= L (:, 1 : h) model. In order to reduce process monitoring complexity and
X build accurate monitoring models for plant-wide processes, the
subspace partition is conducted in this work. Suppose X and Y
G = [G1 , G2 , . . . , Gh ]
denote process variable dataset and quality variable dataset as
 X= [x1 , x2 , . . . , xmx ] ∈ n×mx (10)
= M (:, 1 : h) . (5)

Y Y = y 1 , y 2 , . . . , y my ∈ n×my . (11)

B. CCA-Based Process Monitoring In (10) and (11), xi ∈ n×1 (i = 1, 2, . . . , mx ) is one pro-

cess variable and y j ∈ n×1 (i = 1, 2, . . . , my ) is one quality
The residual is defined as [22] variable, respectively
r (i) = GT y (i) − ΛJ T x (i) . (6) xi = [xi (1), xi (2), . . . , xi (n)]T ∈ n×1 , i = 1, 2, . . . , mx
Similar to the SPE statistic in PCA, a quadratic form statistic (12)
is established as
y j = [yj (1), yj (2), . . . , yj (n)]T ∈ n×1 , j = 1, 2, . . . , my .
SPE (i) = (r (i)) r (i) . (7) (13)
Moreover, two other statistics are constructed as [24] To eliminate the influences of the data scale, the standard-
 T ization is conducted on X and Y using the z-score method. In
Tx2 (i) = J T x (i) J T x (i) (8) the online monitoring stage, product quality is often difficult
 T to be obtained in real time. Some process variables are related
Ty2 (i) = GT y (i) GT y (i) . (9)
to quality variables, and other process variables are not related
to or very weakly related to quality variables. For establishing
C. Problem Formulation
accurate quality-related process monitoring model, process vari-
The CCA-based process monitoring method can detect ables need to be divided into quality-related process variables
whether the fault occurs through constructing monitoring statis- and quality-unrelated process variables. In this article, the elastic
tics in (7)–(9). Meanwhile, there are three problems that need to network [29], which is a regression model, is used. Compared

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

with the least absolute shrinkage and selection operator regres- If the data xi comes from a dataset following Gaussian dis-
sion and the ridge regression, the integration of L1 and L2 tribution, the Jarque–Bera statistic approximately follows a chi-
norms used in the elastic network model can make it not only square distribution with a degree of freedom of 2, so this statistic
retain some general properties but also have some nonzero sparse can be used to test whether the data follow Gaussian distribution.
parameters. For the Gaussian distribution, the skewness is 0 and the kurtosis
The regression coefficient matrix β j is calculated as is 3. The definition of the Jarque–Bera statistic indicates that any
2  deviation from this (skewness = 0; kurtosis = 3) will increase the
n x j
β = arg min yj (k) − βi xi (k) statistic value. If the value of the Jarque–Bera statistic is large,
k=1 i=1 (14) the probability that the chi-square value is greater than the value
mx  j  x  j 2
of the Jarque–Bera statistic is too small, and the data xi cannot
ϕ βi  + (1 − ϕ) βi ≤ t
i=1 i=1 be considered a Gaussian distribution. Conversely, the data xi is
considered to follow Gaussian distribution. As a result, process
where t is the tuning parameter. Once |βij | > ε, ε → 0, corre-
variables dataset X is divided into Gaussian distribution process
sponding process variable i is regarded as quality variable j
variables X G and non-Gaussian distribution process variables
XN .
For one quality variable y j , corresponding quality-related
Finally, QRGV dataset X qG ∈ n×mqG , QUGV dataset
process variables dataset X j is determined. For all quality vari-
X uG ∈ n×muG , QRNV dataset X qN ∈ n×mqN , and QUNV
ables, quality-related process variables dataset X q ∈ n×mq
dataset X uN ∈ n×muN can be obtained as
can be computed as
X q = X
1 ∪ X 2 ∪ · · · ∪ X my X qG = X q ∩ X G (22)
= xq1 , xq2 , . . . , xqmq X uG = X u ∩ X G (23)
T n×1
xqi = [xqi (1), xqi (2), . . . , xqi (n)] ∈  , i = 1, 2 , . . . , mq . X qN = X q ∩ X N (24)
(16) X uN = X u ∩ X N . (25)
In addition to quality-related process variables dataset X q ,
Each dataset in (22)–(25) represent one subspace. A total of
the rest are quality-unrelated process variables dataset X u ∈
four subspaces can be obtained.
Remark 1: The principle of subspace partition based on vari-
X u = CX (X q ) ables selection is that variables in the same subspace have some
similarities, and variables in different subspaces have significant
= [xu1 , xu2 , . . . , xumu ] (17)
differences. The obtained four subspaces in this work are similar
xui = [xui (1), xui (2), . . . , xui (n)]T ∈ n×1 , i = 1, 2, . . . , mu . internally, and different subspaces have differences, which is
(18) consistent with the subspace partition principle. Through the
proposed subspace division method, quality-related variables
For process variables, some obey Gaussian distribution, others
and quality-unrelated variables are divided, facilitating the use
follow non-Gaussian. The control limits of monitoring statis-
of supervised CCA method for quality-related variables to build
tics constructed based on Gaussian distribution variables are
quality-related monitoring statistics and quality-unrelated mon-
convenient to calculate by means of distribution estimation,
itoring statistics and the use of an unsupervised method for
while non-Gaussian distribution variables do not. Therefore, it
quality-unrelated variables to build quality-unrelated monitor-
is necessary to divide process variables according to whether
ing statistic. Compared with mixing quality-related variables
it obeys the Gaussian distribution. To do it, this work uses
with quality-unrelated variables, the proposed subspace division
Jarque–Bera hypothesis test in [30] to obtain variables with
method can effectively improve the accuracy of the monitoring
Gaussian distribution and variables with non-Gaussian distribu-
model. Moreover, Gaussian variables and non-Gaussian vari-
tion. The Jarque–Bera hypothesis test is a test of whether the data
ables are divided into different subspaces for facilitating accurate
conforms to the skewness and kurtosis of normal distribution.
determination of the control limits.
The Jarque–Bera statistic is defined as
 Remark 2: Although subspace partition through process
n 2 (K (xi ) − 3)2 variables selection may destroy the overall correlation between
JB (xi ) = (S (xi )) + , i = 1, 2, . . . , mx variables with different distributions, the correlation between
6 4
(19) variables has not been discarded and that in each subspace is
where n is the number of data, S(xi ) is the skewness of xi , and still considered. In other words, the correlation of all process
K(xi ) is the Kurtosis of xi variables may not be preserved from a global perspective, but the
n correlation between variables in each subspace is strengthened
j=1 [xi (j) − mean (xi )] from a local point of view.
S (xi ) = (20)
n[std (xi )]3
n B. Monitoring Model Construction
j=1 [xi (j) − mean (xi )]
K (xi ) = . (21) Given that the QRGV dataset X qG and QRNV dataset X qN
n[std (xi )]4 belong to process variables, quality-unrelated information is

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

still contained. To construct a quality-related monitoring model, The proposed OCCA method establishes the regression coef-
quality-related information is extracted via the proposed OCCA ficient matrices ϕqG , ϕqN and performs SVD on these matrices
algorithm, and the remaining quality-unrelated information is to obtain quality-related projection direction V 1qG , V 1qN and
used to establish a quality-unrelated monitoring model. For X qG quality-unrelated projection direction V 2qG , V 2qN . As a result,
and X qN , the QR decomposition is conducted as quality-related information and quality-unrelated information
can be completely extracted.
[QqG , RqG ] = QR (X qG ) 2 2
(26) Finally, quality-related monitoring statistics TqG , TqN and
[QqN , RqN ] = QR (X qN ) . 2 2
quality-unrelated monitoring statistics DqG , DqN are con-
Similarly, the QR decomposition is conducted on Y as structed as
[Qy , Ry ] = QR (Y ) (27) 2
TqG (i) = (xqG (i))T V 1qG [cov (X qG V 1qG )]−1 (V 1qG )T xqG (i)
n×n n×n n×n
where QqG ∈  , QqN ∈  , and Qy ∈  are (33)
the unitary matrices, and RqG ∈ n×mqG ,RqN ∈ n×mqN ,
−1/2  −1/2 2
DqG (i) = (xqG (i))T V 2qG [cov (X qG V 2qG )]−1 (V 2qG )T xqG (i)
andRy ∈ n×my . Let the matrix X qG X qG X qG Y Y Y ,
−1/2  −1/2 (34)
X qN X qN X qN Y Y Y , based on X qG = QqG RqG ,
X qN = QqN RqN , Y = Qy Ry , it can be obtained that 2
TqN (i) = (xqN (i))T V 1qN [cov (X qN V 1qN )]−1 (V 1qN )T xqN (i)
−1/2 (35)

DqN (i) = (xqN (i))T V 2qN [cov(X qN V 2qN )]−1 (V 2qN )TxqN (i).
X qG X qG X qG Y Y Y
 −1/2 T  −1/2
= X TqG X qG X qG Y Y T Y
Remark 3: Even if the statistics in (33)–(36) do not contain
 −1/2  −1/2
quality variables directly, these statistics can reflect whether
= (QqG RqG )T QqG RqG X TqG Y (Qy Ry )T Qy Ry
quality-related fault and quality-unrelated fault occur and can
 −1/2  −1/2 effectively solve the real-time problem in the online monitoring
= (RqG )T RqG X TqG Y (Ry )T Ry stage.
In addition, the quality-unrelated monitoring statistics are
= (RqG (1 : mqG , :))−1 X TqG Y (Ry (1 : my , :))−1 established for X uG and X uN as
= [QqG (:, 1 : mqG )]T Qy (:, 1 : my ) (28) 2
DuG (i) = (xuG (i))T [cov (X uG )]−1 xuG (i) (37)
−1/2 −1/2
= [QqN (:, 1 : mqN )]T Qy (:, 1 : my ) .
DuN (i) = (xuN (i))T [cov (X uN )]−1 xuN (i) . (38)
X qN X qN X qN Y Y Y
Given that xqG and xuG follow Gaussian distribution, the
(29) 2 2 2
control limits of TqG ,DqG , andDuG are estimated as
Then, perform SVD on [QqG (:, 1 : mqG )]T Qy (:, 1 : my ) and  
[QqN (:, 1 : mqN )]T Qy (:, 1 : my ) as  2
my n2 − 1
  TqG = Fmy, n−my ;φ (39)
lim n (n − my )
[LqG , D qG , M qG ] = svd [QqG (:, 1 : mqG )]T Qy (:, 1 : my )  
   2  (mqG − my ) n2 − 1
DqG lim = FmqG −my ,n−(mqG −my );φ
[LqN , D qN , M qN ] = svd [QqN (:, 1 : mqN )]T Qy (:, 1 : my) . n (n − (mqG − my ))
(30) (40)
The regression coefficients matrices are calculated as  2  muG n − 1
DuG lim = FmuG ,n−muG ;φ (41)
n (n − muG )
[RqG (1 : mqG , :)]−1 ∗LqG (:, 1 : my ) ∗ D qG (1 : my , :)
ϕqG =
[Ry (1 : my , :)]−1 ∗M qG where φ denotes the confidence level. Since xqN andxuN
−1 never follow Gaussian distribution, the control limits
[RqN (1 : mqN , :)] ∗LqN (:, 1 : my ) ∗ D qN (1 : my , :) 2 2 2
ϕqN = . (TqN )lim ,(DqN )lim , and (DuN )lim are estimated based on the
[Ry (1 : my , :)]−1 ∗M qN kernel density estimation method.
Perform SVD on ϕqG and ϕqN for computing quality-related C. Bayesian Fusion and Monitoring Logic
projection direction V 1qG , V 1qN and quality-unrelated projec-
To facilitate the determination of monitoring logic, the same
tion direction V 2qG , V 2qN as
type of statistics is integrated through the Bayesian fusion strat-
(ϕqG )T =U qG [ς qG 0] [V 1qG V 2qG ]T egy. The Bayesian fusion strategy presented in [31] and [32]
is used in this work. Taking the integration of quality-related
(ϕqN )T =U qN [ς qN 0] [V 1qN V 2qN ]T . (32) monitoring statistics as an example, the Bayesian fusion process

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

is listed as
p TqG |F p (F )
p ∈F =p
TqG F |TqG
= 2   (42)
p TqG 2

 2    p TqN |F p (F )
p TqN ∈ F = p F |TqN =
2   (43)
p TqN 2

 2   2   2 
p TqG = p TqG |N p (N ) + p TqG |F p (F ) (44)
 2   2   2 
p TqN = p TqN |N p (N ) + p TqN |F p (F ) (45)
 2    2  
p TqG |N = exp −TqG 2
/ TqG lim
p TqG |F = exp − TqG 2
/T 2
qG (47) Fig. 1. Diagram of TE.

p TqN |N = exp −TqN 2
/ TqN 2
 2      are selected as monitored process variables, and component G
p TqN |F = exp − TqN 2
/T 2
qN (49) and H in stream 9 are chosen as monitored quality variables.
  A total of 960 normal samples are simulated as the training
p(TqG |F )p(F |TqG
) 2
p(TqN |F )p F |TqN 2
dataset. Two testing cases including quality-unrelated testing
BIC(q) = +    
p(TqG |F ) + p(TqN |F ) p T 2 |F + p T 2 |F
2 2 case and quality-related testing case are simulated as follows.
qG qN The quality-unrelated testing case: The process runs under
(50) normal working condition first, and the reactor cooling water
valve is sticking from the 160th sample to the 960th sample.
where BIC(q) is the integrated quality-related monitoring statis-
The quality-related testing case: The process runs under normal
tic, N is normal, F is abnormal, and p(N ) = φ, p(F ) = 1 − φ.
working condition first, and the composition of A, B, and C
Moreover, quality-unrelated monitoring statistic BIC(u) can
changes randomly from the 160th data to the 960th sample.
be obtained on the basis of the Bayesian fusion strategy. The
To test the advantage of the developed MOCCA method, the
control limits of BIC(q) andBIC(u) are 1 − φ.
PCR, EPLS [18], and TPLS [13] methods are compared. More-
According to BIC(q),BIC(u), and corresponding control
over, given that quality variables cannot be obtained directly, the
limits 1 − φ, process operating conditions can be indicated. The
CCA method with the monitoring statistic in (8) is compared.
monitoring logic is shown as
BIC(q) in MOCCA, T 2 (Q) in PCR, Tu2 in EPLS, Ty2 in TPLS,

BICq < φ Qr in TPLS, and Tx2 in CCA denote quality-related monitoring
⇒ normal (51) statistics. BIC(u) in MOCCA, T 2 (N Q) in PCR, T 2 in EPLS, Q
BICu < φ
 in EPLS, To2 in TPLS, and Tr2 in TPLS denote quality-unrelated
BIC(q) > φ monitoring statistics.
⇒ quality related fault (52)
BIC(u) > φ For quality-unrelated testing case, the false detection rate in
 quality-related subspace is computed as the rate of the quality-
BIC(q) < φ
⇒ quality unrelated fault. (53) related monitoring statistic exceeding the control limit among
BIC(u) > φ
all 960 testing data. The false detection rate in quality-unrelated
subspace is computed as the rate of the quality-unrelated mon-
IV. CASE STUDY itoring statistic exceeding the control limit among 160 normal
In this section, the proposed MOCCA method is evaluated testing data, and the missed detection rate in quality-unrelated
under the Tennessee Eastman (TE) process, which is a sim- subspace is computed as the rate of the misjudged data among
ulation of the actual industrial process. The TE process was 800 abnormal testing data.
proposed by Downs and Vogel of Eastman Chemical Company. For quality-related testing case, the false detection rates in
This standard model has been widely used in the development, both quality-related subspace and quality-unrelated subspace are
research, and evaluation of control technology and monitoring calculated as the rate of the monitoring statistic exceeding the
methods [33]–[35]. The diagram of TE is plotted in Fig. 1. As control limit among 160 normal testing data, and the missed
shown in this figure, this process mainly includes five production detection rates in both quality-related subspace and quality-
units: Reactor, condenser, compressor, separator, and stripper. unrelated subspace are calculated as the rate of the monitoring
The TE process includes the reaction gases of A, C, D, E and statistic below the control limit among 800 abnormal testing
two liquid products G and H. F is the liquid by-product, and data.
B is the crystal catalyst. The whole process includes 22 mea- In addition, the accuracy scores of PCR, EPLS, CCA, TPLS,
surement variables, 12 manipulated variables, and 19 component and MOCCA for both the quality-unrelated testing case and the
variables. In this simulation, a total of 22 measurement variables quality-related testing case are defined as the proportion of the

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

Fig. 4. Actual trajectory of quality variables and the reactor tempera-

Fig. 2. Absolute values of regression coefficients between process ture variable for the quality-unrelated testing case.
variables and quality variable G.


of two process variables. As shown in this figure, the process

variable (A + C feed) follows Gaussian distribution, and the
process variable (reactor temperature) follows the non-Gaussian
distribution. Therefore, it can be concluded that the TE process
can be clearly divided into QRGV, QUGV, QRNV, and QUNV.

A. Quality-Unrelated Testing Case

For this testing case, the reactor cooling water valve is stick-
Fig. 3. Probability plot for normal distribution of two process variables. ing. The fault would cause violent oscillation on the cooling
water flow. Further, it can cause severe fluctuations in the reactor
temperature. Benefiting from the control loop, the fault never
number of accurately classified data to the number of the testing affects product quality. Fig. 4 gives the actual trajectory of
dataset in the quality-related subspace. When the method has quality variables G, H and the reactor temperature. As presented
more than one quality-related statistic or quality-unrelated statis- in Fig. 4, quality variables G, H remain on the same level even
tic, the accuracy score of each statistic is calculated, respectively, if the fault happens, and the temperature changes from the 160th
and the optimal accuracy score is regarded as the accuracy score data. Thence, it can be concluded that this testing case belongs
of the corresponding method. to quality-unrelated fault.
Fig. 2 gives absolute values of regression coefficients between Monitoring results of PCR, EPLS, CCA, TPLS, and MOCCA
process variables and the quality variable G. From this figure, for the quality-unrelated testing case are given in Figs. 5 and 6,
regression coefficients between some process variables and the respectively, where blue represents normal data, red represents
quality variable G are large. For example, the regression coeffi- abnormal data, and green represents misjudged data. Table I
cient between process variable 18 (stripper temperature) and shows false detection rates of BIC(q) in MOCCA, T 2 (Q) in
the quality variable G is close to 1. Some process variables PCR, Tu2 in EPLS, Tx2 in CCA, Ty2 in TPLS, and Qr in TPLS
and quality variables have small regression coefficients. For for the quality-unrelated testing case. The false detection rates
example, the regression coefficient between process variable 17 of Tu2 in EPLS and Tx2 in CCA are 20.9375% and 21.5625%,
(stripper underflow) and the quality variable G is close to 0. For respectively. As shown in Fig. 5(c) and (f), the Tu2 statistic in
variables with regression coefficients close to 0, the calculated EPLS and Tx2 statistic in CCA of more than 20% abnormal data
small regression coefficients may be caused by system noise. are above the control limit. From Fig. 5(g) and (j), the statistic
Actually, these variables are unrelated with quality variables. value of more than half abnormal data is larger than the control
Fig. 3 shows the probability plot for the normal distribution limit, and the false detection rates of Ty2 and Qr in TPLS are

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.




are 0, indicating that both EPLS and MOCCA methods can

effectively detect this fault in a timely manner and obtain the
Fig. 5. Monitoring results of PCR, EPLS, CCA, and TPLS for the
quality-unrelated testing case.
optimal monitoring results in the quality-unrelated subspace.
Table IV lists the accuracy score of PCR, EPLS, CCA, TPLS,
and the proposed MOCCA method. As shown in this table, the
proposed MOCCA method can obtain the best accuracy for the
quality-unrelated testing case.
In summary, the false detection rates of EPLS, CCA, and
TPLS in Table I are unsatisfactory. Both EPLS and MOCCA
methods can obtain the optimal missed detection rates in quality-
unrelated subspace. The proposed MOCCA method can obtain
the smallest missed detection rate in Table III without sacrificing
the false detection rate in both Tables I and II. Moreover, the
accuracy score of MOCCA is optimal as listed in Table IV.
By comparing the above, the conclusion that MOCCA is more
reasonable and advantageous than PCR, EPLS, CCA, and TPLS
in terms of quality-unrelated fault detection can be made.

B. Quality-Related Testing Case

In the quality-related testing case, the composition of A,
B, and C changes randomly. This case would make dramatic
Fig. 6. Monitoring results of MOCCA for the quality-unrelated testing changes in product variables. The actual trajectory of quality
variables Gand H for the quality-related testing case is pre-
sented in Fig. 7. As presented in this figure, compared with the
normal condition, the static value and the dynamic behavior of
56.98% and 73.02%, respectively. Excessive false detection rate two quality variables change obviously once the abnormality
indicates that the monitoring results of EPLS, CCA, and TPLS happens from the 160th data. Thence, it can be stated that this
are not consistent with the actual trajectory of quality variables. testing case is quality-related fault.
In contrast, the false detection rate of BIC(q) in MOCCA is Monitoring results of PCR, EPLS, CCA, TPLS, and MOCCA
1.25%. In Fig. 6(a), no continuous anomalies are detected. for the quality-related testing case are given in Figs. 8 and
Moreover, Tables II and III list the false detection rates and 9. Tables V and VI give the false detection rates and missed
missed detection rates of BIC(u) in MOCCA, T 2 (N Q) in PCR, detection rates of BIC(q) in MOCCA, T 2 (Q) in PCR, Tu2
T 2 in EPLS, Q in EPLS, To2 in TPLS, and Tr2 in TPLS for the in EPLS, Tx2 in CCA, Ty2 in TPLS, and Qr in TPLS of the
quality-unrelated testing case. The false detection rates listed in quality-related testing case. The false detection rates listed in
Table II are lower than 5%, which can be acceptable. The missed Table V are lower than 5%, which are satisfactory. In Fig. 8(j),
detection rates of both Q in EPLS and BIC(u) in MOCCA Qr in TPLS almost cannot detect this abnormal condition, and

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

Fig. 7. Actual trajectory of quality variables for the quality-related

testing case. Fig. 9. Monitoring results of MOCCA for the quality-unrelated testing


69.75% abnormal testing data are misjudged as the normal data.

Compared with Qr in TPLS, the monitoring performance of
T 2 (Q) in PCR, Tu2 in EPLS, Tx2 in CCA, and Ty2 in TPLS has
been improved, where the missed detection rates are less than
20%. The monitoring performance of BIC(q) in MOCCA is best
compared with PCR, EPLS, CCA, and TPLS, where the missed
detection rate in quality-related subspace is 5.75%. Table VII
lists the accuracy score of PCR, EPLS, CCA, TPLS, and the
proposed MOCCA method. From this table, it can be seen that
the proposed MOCCA method obtains the best accuracy score
for the quality-related testing case.
In conclusion, the PCR, EPLS, CCA, TPLS methods, and
the proposed MOCCA method can classify this testing case as
Fig. 8. Monitoring results of PCR, EPLS, CCA, and TPLS for the the quality-related fault in accordance with the actual situation.
quality-related testing case. According to the above analysis, compared with PCR, EPLS,
CCA, and TPLS, the smallest missed detection rate obtained by
TABLE V BIC(q) and the optimal accuracy score obtained by MOCCA
can prove the superiority of the proposed MOCCA method in
terms of quality-related fault detection.

This article proposed a novel quality-related plant-wide pro-
TABLE VI cess monitoring method. It could not only detect the occur-
MISSED DETECTION RATES (%) OF QUALITY-RELATED SUBSPACE rence of the fault but could also differentiate whether the fault
affects product quality or not. In order to establish accurate
monitoring model for the plant-wide process, the original space
was partitioned into four subspaces with similar data charac-
teristics. Compared to the traditional methods, the proposed
MOCCA method focused on quality-related information as well

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

as quality-unrelated information. Based on the constructed mon- [17] S. J. Qin and Y. Zheng, “Quality-relevant and process-relevant fault mon-
itoring statistics and the Bayesian fusion strategy, the monitoring itoring with concurrent projection to latent structures,” AIChE J., vol. 59,
no. 2, pp. 496–504, Feb. 2013.
logic was determined, which could judge the process operating [18] K. X. Peng, K. Zhang, B. Yu, and J. Dong, “Quality-relevant fault monitor-
condition as normal, quality-related fault, and quality-unrelated ing based on efficient projection to latent structures with application to hot
fault. strip mill process,” IET Control Theory Appl., vol. 9, no. 7, pp. 1135–1145,
May. 2015.
Finally, to show the effectiveness and superiority of the pro- [19] Z. Q. Ge, B. Huang, and Z. H. Song, “Mixture semisupervised principal
posed MOCCA method, the TE process including the quality- component regression model and soft sensor application,” AIChE J.,
unrelated testing case and the quality-related testing case was vol. 60, no. 2, pp. 533–545, Feb. 2014.
[20] X. F. Yuan, Z. Q. Ge, and Z. H. Song, “Locally weighted kernel principal
used. In contrast to several state-of-the-art methods PCR, EPLS, component regression model for soft sensing of nonlinear time-variant
CCA, and TPLS, the monitoring results of two testing cases processes,” Ind. Eng. Chem. Res., vol. 53, pp. 13736–13749, Sep. 2014.
showed that the developed MOCCA method could achieve the [21] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation
analysis: An overview with application to learning methods,” Neural
best accurate score and rich monitoring information in terms of Comput., vol. 16, no. 12, pp. 2639–2664, Dec. 2004.
both quality-unrelated fault detection and quality-related fault [22] Z. W. Chen, S. X. Ding, K. Zhang, Z. B. Li, and Z. K. Hu, “Canonical
detection. correlation analysis-based fault detection methods with application to
alumina evaporation process,” Control. Eng. Pract., vol. 46, pp. 51–58,
Oct. 2015.
[23] Z. W. Chen, S. X. Ding, T. Peng, C. H. Yang, and W. H. Gui, “Fault detec-
REFERENCES tion for non-Gaussian processes using generalized canonical correlation
analysis and randomized algorithms,” IEEE Trans. Ind. Electron., vol. 65,
[1] S. X. Ding, S. Yin, K. Peng, H. Hao, and B. Shen, “A novel scheme for
no. 2, pp. 1559–1567, Feb. 2018.
key performance indicator prediction and diagnosis with application to
[24] Z. W. Chen, K. Zhang, S. X. Ding, Y. A. W. Shardt, and Z. K. Hu, “Improved
an industrial hot strip mill,” IEEE Trans. Ind. Informat., vol. 9, no. 4,
canonical correlation analysis-based fault detection methods for industrial
pp. 2239–2247, Nov. 2013.
processes,” J. Process Control, vol. 41, pp. 26–34, Mar. 2016.
[2] S. M. Zhang and C. H. Zhao, “Slow-feature-analysis-based batch process
[25] Q. Q. Zhu, Q. Liu, and S. J. Qin, “Concurrent quality and process mon-
monitoring with comprehensive interpretation of operation condition de-
itoring with canonical correlation analysis,” J. Process Control, vol. 60,
viation and dynamic anomaly,” IEEE Trans. Ind. Electron., vol. 66, no. 5,
pp. 95–103, Dec. 2017.
pp. 3773–3783, May 2019.
[26] Q. Liu, S. J. Qin, and T. Y. Chai, “Unevenly sampled dynamic data
[3] K. Zhang, K. X. Peng, and J. Dong, “A common and individual feature
modeling and monitoring with an industrial application,” IEEE Trans. Ind.
extraction-based multimode process monitoring method with application
Informat., vol. 13, no. 5, pp. 2203–2213, Oct. 2017.
to the finishing mill process,” IEEE Trans. Ind. Informat., vol. 14, no. 11,
[27] Y. Q. Liu, B. Liu, X. J. Zhao, and M. Xie, “A mixture of variational
pp. 4841–4850, Nov. 2018.
canonical correlation analysis for nonlinear and quality-relevant process
[4] Z. Q. Ge, “Distributed predictive modeling framework for prediction and
monitoring,” IEEE Trans. Ind. Electron., vol. 65, no. 8, pp. 6478–6486,
diagnosis of key performance index in plant-wide processes,” J. Process
Aug. 2018.
Control, vol. 65, pp. 107–117, May 2018.
[28] Q. C. Jiang, S. X. Ding, Y. Wang, and X. F. Yan, “Data-driven distributed
[5] J. Macgregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi, “Process
local fault detection for large-scale processes based on GA-regularized
monitoring and diagnosis by multiblock PLS methods,” AIChE J., vol. 40,
canonical correlation analysis,” IEEE Trans. Ind. Electron., vol. 64, no. 10,
pp. 826–838, May 1994.
pp. 8148–8157, Oct. 2017.
[6] Z. Q. Ge, “Review on data-driven modeling and monitoring for plant-
[29] H. Zou and T. Hastie, “Regularization and variable selection via the elastic
wide industrial processes,” Chemom. Intell. Lab. Syst., vol. 171, pp. 16–25,
net,” J. R. Statist. Soc. B, vol. 67, pp. 301–320, 2005.
Sep. 2017.
[30] C. M. Jarque and A. K. Bera, “A test for normality of observations and
[7] Q. C. Jiang, X. F. Yan, and B. Huang, “Performance-driven distributed PCA
regression residuals,” Int. Stat. Rev., vol. 55, pp. 163–172, 1987.
process monitoring based on fault-relevant variable selection and Bayesian
[31] Z. Q. Ge, M. G. Zhang, and Z. H. Song, “Nonlinear process monitoring
inference,” IEEE Trans. Ind. Electron., vol. 63, no. 1, pp. 377–386,
based on linear subspace and Bayesian inference,” J. Process Control,
Jan. 2016.
vol. 20, pp. 676–688, Jun. 2010.
[8] B. Song and H. B. Shi, “Fault detection and classification using quality-
[32] Q. Jiang and X. Yan, “Monitoring multi-mode plant-wide processes by
supervised double-layer method,” IEEE Trans. Ind. Electron., vol. 65,
using mutual information-based multi-block PCA, joint probability, and
no. 10, pp. 8163–8172, Oct. 2018.
Bayesian inference,” Chemom. Intell. Lab. Syst., vol. 136, pp. 121–137,
[9] K. Zhang, K. X. Peng, S. X. Ding, Z. W. Chen, and X. Yang, “A
correlation-based distributed fault detection method and its application
[33] S. Yin, H. Luo, and S. X. Ding, “Real-time implementation of fault-tolerant
to a hot tandem rolling mill process,” IEEE Trans. Ind. Electron., vol. 67,
control systems with performance optimization,” IEEE Trans. Ind. Elec-
no. 3, pp. 2380–2390, Mar. 2020.
tron., vol. 61, no. 5, pp. 2402–2411, May 2014.
[10] C. H. Zhao and Y. X. Sun, “Multispace total projection to latent structures
[34] Z. Chai and C. H. Zhao, “Enhanced random forest with concurrent analysis
and its application to online process monitoring,” IEEE Trans. Contr. Syst.
of static and dynamic nodes for industrial fault classification,” IEEE Trans.
Tech., vol. 22, no. 3, pp. 868–883, May 2014.
Ind. Informat., vol. 16, no. 1, pp. 54–66, Jan. 2020.
[11] X. F. Yuan, L. Li, and Y. L. Wang, “Nonlinear dynamic soft sensor
[35] H. Luo, H. Zhao, and S. Yin, “Data-driven design of fog-computing-aided
modeling with supervised long short-term memory network,” IEEE Trans.
process monitoring system for large-scale industrial processes,” IEEE
Ind. Informat., vol. 16, no. 5, pp. 3168–3176, May 2020.
Trans. Ind. Informat., vol. 14, no. 10, pp. 4631–4641, Oct. 2018.
[12] B. Song, X. G. Zhou, H. B. Shi, and Y. Tao, “Performance-indicator-
oriented concurrent subspace process monitoring method,” IEEE Trans.
Ind. Electron., vol. 66, no. 7, pp. 5535–5545, Jul. 2019.
[13] D. H. Zhou, G. Li, and S. J. Qin, “Total projection to latent structures for
process monitoring,” AIChE J., vol. 56, no. 1, pp. 168–178, Jan. 2010. Bing Song received the B.E. degree in automa-
[14] G. Li, B. S. Liu, S. J. Qin, and D. H. Zhou, “Quality relevant data- tion and the Ph.D. degree in control theory and
driven modeling and monitoring of multivariate dynamic processes: The control engineering from the East China Uni-
dynamic T-PLS approach,” IEEE Trans. Neural. Netw., vol. 22, no. 12, versity of Science and Technology, Shanghai,
pp. 2262–2271, Dec. 2011. China, in 2012 and 2017, respectively.
[15] G. Wang and S. Yin, “Quality-related fault detection approach based on Currently, he is an Associate Professor with
orthogonal signal correction and modified PLS,” IEEE Trans. Ind. Inform., the Department of Automation, East China Uni-
vol. 11, no. 2, pp. 398–405, Apr. 2015. versity of Science and Technology. His current
[16] S. Yin, X. P. Zhu, and O. Kaynak, “Improved PLS focused on key- research interests include feature extraction,
performance-indicator-related fault diagnosis,” IEEE Trans. Ind. Electron., fault detection, fault diagnosis, and multimode
vol. 62, no. 3, pp. 1651–1658, Mar. 2015. process monitoring.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

Hongbo Shi received the B.E. degree in chem- Yang Tao received the B.E. degree in automa-
ical automation and the Ph.D. degree in con- tion from the Zhengzhou University, Zhengzhou,
trol theory and control engineering from the China, in 2015, and the Ph.D. degree in control
East China University of Science and Tech- theory and control engineering from the East
nology, Shanghai, China, in 1986 and 2000, China University of Science and Technology,
respectively. Shanghai, China, in 2020.
Currently, he is a Professor with the East Currently, he is a Postdoctoral Researcher
China University of Science and Technology. His with the Department of Automation, East China
current research interests include modeling of University of Science and Technology. His cur-
industrial process and advanced control tech- rent research interests include feature extrac-
nology, theory and methods of integrated au- tion, process operating performance assess-
tomation systems, condition monitoring and fault diagnosis of industrial ment, fault detection, and fault diagnosis.
Prof. Shi was the 2003 Shu Guang Scholar of Shanghai.

Shuai Tan received the B.S. degree in automa-

tion and the Ph.D. degree in control theory and
control engineering from the Northeastern Uni-
versity, Shenyang, China, in 2005 and 2012,
Currently, she is an Associate Professor with
the East China University of Science and Tech-
nology, Shanghai, China. Her current research
interests include operation state evaluation for
complex industrial process, fault monitoring
and diagnosis, and machine learning of image

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE SANTA CATARINA. Downloaded on September 23,2022 at 18:58:19 UTC from IEEE Xplore. Restrictions apply.

You might also like