Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Int. J. Modelling, Identification and Control, Vol. 34, No.

2, 2020 75

A review on data-driven approaches for industrial


process modelling

Wei Guo and Tianhong Pan*


School of Electrical Engineering and Automation,
Anhui University,
Hefei, Anhui 230601, China
and
School of Electrical Information and Engineering,
Jiangsu University,
Zhenjiang, Jiangsu 212013, China
Email: guow1989@163.com
Email: thpan@ahu.edu.cn
*Corresponding author

Zhengming Li
School of Electrical Information and Engineering,
Jiangsu University,
Zhenjiang, Jiangsu 212013, China
Email: lzming@ujs.edu.cn

Guoquan Li
Jiangsu Hengshun Vinegar Industry Co., Ltd.,
Zhenjiang, Jiangsu 212043, China
Email: 13775556788@139.com

Abstract: Data-driven techniques in industrial processes have been continually attended during
the past decades. However, there are many challenging issues in this field when the collected
data presents different characteristics. In order to sketch the principle of different modelling
methods under various working conditions, data-driven modelling methods from perspectives
of data structures and model structures are reviewed in this paper. Firstly, the data collection
and preprocessing procedure are inspected. Then, popular methods from linear (including the
multivariate linear regression (MLR), to latent variable projection (LVP), etc.) to nonlinear
methods (including artificial intelligence, Gaussian process regression (GPR), local model, etc.)
are discussed. Finally, the model calibration strategies (including offset-based method, recursive
method, moving window method) are also reviewed. The major purpose is to support the industrial
process modelling for technical users by providing a set of data-driven methods.

Keywords: data-drive modelling; industrial process; machine learning; data analytics; model
structure.

Reference to this paper should be made as follows: Guo, W., Pan, T., Li, Z. and Li, G. (2020)
‘A review on data-driven approaches for industrial process modelling’, Int. J. Modelling,
Identification and Control, Vol. 34, No. 2, pp.75–89.

Biographical notes: Wei Guo received his BSc in Electronic Information Engineering from
Jingjiang College of Jiangsu University, Zhenjiang, China; and the MS in School of Electrical
Information and Engineering of Jiangsu University, Zhenjiang, China, in 2012, and 2016,
respectively. He is currently pursuing the PhD in Control Science and Engineering in Jiangsu
University. His research interests include process modelling, clustering algorithm and artificial
intelligence.

Tianhong Pan received his BSc in Mechatronics Technology and Applications from Anhui
Agriculture University, Hefei, China; the MS in Power Electronics and Electric Power Drives from
the Gansu University of Technology, Lanzhou, China; and the PhD in Control Theory and Control
Engineering from Shanghai Jiao Tong University, Shanghai, China, in 1997, 2000, and 2007,

c 2020 Inderscience Enterprises Ltd.


Copyright
76 W. Guo et al.

respectively. He has been a Professor with the School of Electrical Engineering and Automation,
Anhui University, Hefei, Anhui, China. His research interests include multiple model approach and
its application, machine learning, virtual metrology, predictive control and run-to-run control theory
and practice, etc.

Zhengming Li received his BSc in Automation in 1982 from Jiangsu University, Zhenjiang,
China and the MS in Control Theory and Engineering in 1988 from Xi’an Jiao Tong University,
Xian, China. He has been a Professor in the School of Electrical and Information Engineering,
Jiangsu University. His research interests include R2R control theory and practice, complex process
modeling and control, system identification.

Guoquan Li has been a Director in the Jiangsu Hengshun Vinegar Industry Co., Ltd.

1 Introduction (RF) (Ahmad et al., 2014), and support vector machine


(SVM) (Pooyan et al., 2015). Different from the supervised
With the development of computer science and the related learning framework, unsupervised learning methods use
measurement techniques, it makes a significant contribution against data without labels. Correspondingly, unsupervised
to the collection and storage of operating data in industrial learning methods have been employed to reveal the hidden
processes. At the same time, such convenience provides information from process data, which includes control limits
adequate supports for database, data mining, and data-driven of normal work conditions, distribution of clusters, outlier
technologies. The data-driven method is more flexible to criteria, and latent structures. For example, the fault detection
facilitate the construction of industrial process systems as model can detect an abnormal event during the industrial
against the traditional knowledge-based method. Therefore, process by using the Q-statistic and the Hotelling’s T 2 index
the data-driven method has been extensively used for (Liu et al., 2014). And the clustering algorithm simplifies
industrial processes including process modelling, controller the complex data distribution (Khediri et al., 2012). There
designing, and system optimising (Ahmad et al., 2014; Liu are several well known unsupervised learning algorithms in
et al., 2019a; Ge, 2017; Gopakumar et al., 2018; Smith and the process industry, which includes principal component
Jin, 2014; Tamboli and Chile, 2018). analysis (PCA) (Du et al., 2017), factor analysis (FA) (Ge
By mining the information of the collected data, various and Song, 2012), independent component analysis (ICA) (Liu
machine learning algorithms can be employed for industrial and Zhang, 2015), quality trend analysis (QTA) (Zhao et al.,
process monitoring (Fezai et al., 2018; Jiang et al., 2016; 2017), clustering algorithms (Zhu et al., 2011), autoencoders
Peng et al., 2015; Qin, 2012; Wang et al., 2019a), soft (Shang et al., 2014). Recently, much attention has been
sensing (Kadlec et al., 2009; Kaneko and Funatsu, 2014; gained in the industrial process using semi-supervised
Yao and Ge, 2017; Yuan et al., 2018), and stage division learning. For a similar purpose as supervised learning,
(Frigieri et al., 2016; Quiñones-Grueiro et al., 2019; Zhao however, a small volume of expensive labelled measurements
et al., 2014; Zhu et al., 2011). There are three different and a large number of cheap unlabelled measurements are
algorithms in machine learning techniques, which are used for training sometimes. Thus, the high cost of data
summarised as supervised learning, unsupervised learning, labelling cannot guarantee a full sample training in the
and semi-supervised learning. Among them, unsupervised modelling process (Ge and Song, 2012). Therefore, semi-
and supervised learning are the most widely adopted, which supervised learning makes a connection between supervised
may have been used for 80-90% of applications (Ge et al., learning and unsupervised learning. Moreover, those two
2017). In supervised learning structures, data that consist learning frameworks can also help each other for process
of inputs along with their corresponding labelled targets modelling. For example, clustering algorithms can simplify
are used for model construction. Such targets may include a complex soft sensing model into several simple models,
process outputs, key variable indices, product quality, and and predictive key variables can also be added into the
fault pattern, etc. The soft sensor is a typical application calculation of control limits. The relationship of unsupervised
by using supervised learning, which belongs to a regression learning, supervised learning, and semi-supervised learning
model (Kadlec et al., 2009; Shao and Tian, 2015). If the is presented in Figure 1.
labelled target is composed of discrete values, supervised In order to cooperate with the high requirement
learning can be applied for fault diagnosis (Ge et al., of industrial production, many modelling structures are
2017). Several supervised learning methods are widely presented, such as lazy learning (Liu et al., 2012), multi-
used in industrial processes, whose algorithms include model (Stoica et al., 2004), fuzzy model (Qi et al.,
multivariate linear regression (MLR) (Huang and Zhao, 2014), ensemble learning (Kaneko and Funatsu, 2014),
2018), canonical correlation analysis (CCA) (Zhu et al., and transfer learning (Liu et al., 2019b). A wide variety
2017), principal component regression (PCR) (Yuan et al., of methods can be constructed by using different model
2014a), partial least squares (PLS) (Ahmed et al., 2009), structures and algorithms. Therefore, this paper provides a
artificial intelligence (AI) (Smith and Jin, 2014), Gaussian systematic review related to data-driven process modelling
process regression (GPR) (Zhou et al., 2015), random forest methods, from the viewpoints of basic algorithms and
A review on data-driven approaches for industrial process modelling 77

model structures. Under each specific topic, the principle from a historical database. For the purpose of constructing
of different methods or structures is discussed, with several an effective training model, the operating region of processes
corresponding industrial applications. should be noticed. Thus, the impractical samples recorded
by any invalid sensors will not be selected. Moreover,
Figure 1 The relationship of three modelling categories the category of selected dataset is not same for different
(see online version for colours) modelling purposes. For example, in soft sensing or fault
detecting, the dataset from a stable operating condition should
Unlabeled Labeled be selected. But in fault diagnosis models, abnormal data
Data Data
collection should be carried out as well. Another important
issue is to carry out the variable selection. Typically, the
variables measured in the industrial process may have co-
linear and redundant problems. There are two ways to
handle the collinearity problem. One is to use the statistic
Semi-supervised analysis method, such as PCA (Jiang et al., 2016), PLS
Learning
(Rinnan et al., 2014), and CCA (Ogura, 2010). Another is
to use the computational learning research for the feature
Unlabeled Unlabeled Labeled selection, such as support vector regression (SVR) (Ben
Data Data Data
Ishak, 2016), genetic algorithms (GA) (Lu et al., 2016), and
RF (Hapfelmeier and Ulm, 2013).

2.2 Data pre-processing


Unsupervised Supervised
Learning Learning Data collection in industrial processes always comes with a
(PCA, ICA, FA, k-means, etc.) (PCR, PLS, SVM, GPR, etc.) series of well-known problems, including outliers, missing
data, and so on. Therefore, data pre-processing is urgently
required for data quality improvement. The outlier can be
The rest of this paper is organised as follows. The main caused by measurement noises, communication errors, or
methodology of process modelling is provided in Section 2. sensor degradations. The 3σ-rule is the most well-known
Several approaches to data-driven modelling are carried out in univariate approach for outlier detection. And for multivariate
Section 3, including mathematical methods (Section 3.1) and models, it often works by using the interaction among
model selection (Section 3.2). In Section 4, some perspectives variables, such as the distance, angle, and density (Souza
on the topic of process modelling for further applications are et al., 2016). On the other hand, there may be some missing
presented. At last, a conclusion is obtained in Section 5. values that are failed to be recorded. It is mainly caused
by the outlier deletion or the incomplete transportation in
data collection step. The observation containing missing
2 Methodology values can be directly discarded if there is a small
number of missing objects. Otherwise, the missing value
The general procedure of process modelling includes data is needed to be imputed by using maximum likelihood
collection and its pre-processing, structure and model (ML) method, expectation-maximisation (EM) method, and
selection, model training, validation, and the corresponding regression method, etc (Souza et al., 2016; Zhu et al., 2018).
maintenance. An overview of process modelling is presented In addition, sampling rates, measurement delays, dynamical
in Figure 2, in which, the main methodology procedure is characteristics and data normalisation are also needed to be
linked with black arrows. It can be seen that the technological taken into consideration (Kadlec et al., 2009; Qin, 2012).
process of modelling is divided into two parts:
2.3 Model selection
• modelling preparation (i.e., data collection and its
pre-processing, and model selection) Before the training step, there is a critical step that has to be
• process modelling and analysis (i.e., model training, addressed, i.e., the model selection. As mentioned in previous
model analysis, and model maintenance). section, large amounts of models can be constructed by taking
a combination of different model structures and mathematical
Note that a large amount of dramatic works related to each methods. Subsequently, two thorny issues arise: what kind
procedure have been presented in the last decades, only some of model should be employed, and how much complexity
functional descriptions are discussed in this section, and the the model structure is required? Obviously, the most complex
details of basic approaches are presented in Section 3. model may not be the best one. Taking the regression based
modelling as an example, a nonlinear model is unnecessary
2.1 Data collection while the relationship between process variables and key
variables is linear. Model selection is not only highly related
As the first step for developing a data-driven model, the to the modelling purpose but also associated with data
main task of data collection is to collect modelling data relationships and characteristics (Ge, 2017). Consequently,
78 W. Guo et al.

Figure 2 Overview of methodology for industrial processes modelling (see online version for colours)

Variable Outlier
Data Collection
Selection Detection
and
Historical
Sample Missing Pre-processing
Database Selection Data

Structure
Structures Selection Semi-supervised
Learning
Training
Supervised Unsupervised
Model
Learning Learning
Mathematical Selection
Methods

Query Data Constructed Model Feedback Data


Collection Model Maintenance Collection

Key Variable Quality Pattern Inner-phase


Model Prediction Assessment Analysis Analysis
Analysis Fault Fault Correlation
Fault Isolation
Detection Diagnosis Analysis

there is no unified guidance for selecting a structure or a until the termination of feedbacks (Qin, 1998). It should
mathematical method. However, several useful rules such be noticed that the feedback sample needs to satisfy the
as Bayesian information criterion and Akaike information requirement of the criteria in data collection and data pre-
criterion could promote the model selection (Bishop, 2006). processing.

2.4 Model training and model validation


3 Algorithms review
Once the modelling structure has been determined, the data-
driven model is trained by gathering all of contributions A systematic review of machine learning algorithms for
of previous steps. In the training step, different algorithms industrial process modelling is presented in this section.
depend on their corresponding inner parameter structures. As has been mentioned before, applications on data-
Such algorithms actually can be performed as an optimisation driven modelling are mainly partitioned into three types,
to get optimal parameters, such as the optimised regression i.e., unsupervised, supervised, and semi-supervised learning
parameter, control limitation, cluster centre, density frameworks. Meanwhile, each type may not be the only used
distribution, decision policy, and stage division. It should scheme that addresses the modelling process independently.
be noticed that the obtained model is needed to be validated Therefore, mathematical methods including linear models,
before the online application. Two steps are mainly contained nonlinear models, and adaptive models are firstly reviewed.
in model validation, namely, the self-validation and the And then, a number of optional model structures above these
cross-validation (Khatibisepehr et al., 2013). Self-validation mathematical methods are supplemented. Detailed reviews
can exclude the inadequate model obtained from the last are provided as follows.
step. Cross-validation can guarantee the capability of
generalisation by analysing the identification performance
with the dataset that has not been used during the training 3.1 Mathematical methods
phase. Some other validation methods are also presented in
Bishop (2006).
3.1.1 Linear models
(1) Multivariate linear regression (MLR)
2.5 Model maintenance and calibration
The MLR is a generalised model by considering the
In order to deal with the slow change caused by aging linear relationship between one dependent variable and a
equipment, climate-varying, or adapt new work conditions number of independent variables (Abyaneh, 2014). The
in industrial processes, a good data-driven model should ordinary least squares (OLS) regression is the most widely
contain the mechanism of online self-maintenance. There are known approach, which promotes the development of other
two schemes being mainly used for model calibrations: one algorithms, indirectly. Since the OLS has the numerical
needs to be adjusted only if the significant variation from problem when independent variables are strongly co-linear.
the identified model is detected in the online application (Ma Several feature selection methods are presented, such as the
et al., 2009), and the other one keeps calibrating operations stepwise regression (Rothman et al., 2013), genetic algorithm
A review on data-driven approaches for industrial process modelling 79

(Lu et al., 2016), etc. On the other hand, the ridge regression all. Different objective functions can be derived through the
and latent variable can also deal with the collinear problem inherent concept of the maximum non-Gaussianity, and such
(Piepho, 2009). Most of applications using MLR focus on optimised objective functions enable the estimation of the
the key variable prediction. For example, Wang et al. (2019b) ICA model (Hyvärinen and Oja, 2000). Originally, ICA was
applied the MLR to influent chemical oxygen demand and proposed to deal with the problem of blind source separation
total phosphorus prediction in wastewater treatment plants. (Back and Weigend, 1997). Then, ICA was introduced into
Liu et al. (2018a) created a multivariate phase space by data-driven modelling methods (Ding et al., 2017; Lee et al.,
referencing the phase state and predicted the very-short-term 2007; Liu and Zhang, 2015; Liu et al., 2019a; Zhang and Qin,
wind power. Besides the predictive purpose, the MLR can 2007).
also be applied for process monitoring (Eyvazian et al., 2011).
Figure 3 The relationship among PCA, PCR, PLS, and PLSR
(2) Latent variable projection (LVP) model (see online version for colours)

The latent variable projection (LVP) model is the most PCR


popular method to deal with the numerical problems with the
large dimensional data or the high co-linear data by projecting
X T U Y
multidimensional process variables from the original space
onto a subspace, which mainly includes PCA, PCR, PLS, and
PLS regression (PLSR). In PCA, there are three mathematical PLS
PT QT
calculation methods to compute the independent principal PCA PLSR
component (PC) and loading matrix, i.e., the nonlinear
iterative partial least squares (NIPALS), the eigenvalue
decomposition and the singular value decomposition (SVD)
(Geladi and Kowalski, 1986; Qin, 2012). The PCA is an
autoencoder in the unsupervised learning model. Meanwhile, Figure 4 The principle of ICA (see online version for colours)
the obtained PCs can also be addressed to the regression X2 X2
model by combining the labelled response key variable, i.e.,
PCR. While PCA obtains the PC that neglecting the reference
to the variance of key variables. Consequently, those PCs Separating
may not be related to the variation of the key variable.
Contrary to PCA, PLS obtains constrained latent variables
(LVs) by simultaneously decomposes key variables and
process variables. Such LVs not only contain the maximum
correlation between key variables (Y ) and process variables Non-independent X1 Independent X1
(X) but also retain the maximum variation both in Y and X
(Guo et al., 2016). Similarly, such linear relationship among
LVs can also be used for regression, namely PLSR. The
relationship between PCA, PCR, PLS, and PLSR is shown (4) Probabilistic model
in Figure 3, where, P and Q are the loads, T and U are the
corresponding scores of the LV. The probabilistic model has the similar structure as
Large amounts of applications of LVP models have been LVP, which contains factor analysis (FA), probabilistic
made for process monitoring (Fezai et al., 2018; Jiang et al., PCA (PPCA), probabilistic PLS (PPLS), etc. Since the
2016; Zhang et al., 2019), soft sensing (Chiplunkar and process variable is often contaminated with uncertainties
Huang, 2019; Kaneko and Funatsu, 2016; Poerio and Brown, or noises as the consequence of transmission disturbances
2018), outlier detection (Peng et al., 2012), and missing data and measurement variations. It is more reasonable to carry
(Bao et al., 2015) purposes, which becomes a basic tool for out the probabilistic framework in industrial processes by
data-driven modelling. Because of the space limitation of this considering the process variable as a random one (Yuan
paper, only a part of related works is reviewed here. et al., 2017). Taking the FA as an example, the original
variable space is treated as linear combination of factors
(3) Independent component analysis (ICA) (shown in Figure 5), where, the variables, factors and noises
are formulated as the form of probability density function
Motivated by the poor capability of PCA in dealing with (PDF). Therefore, different from the deterministic PCA, the
a non-Gaussian dataset, an emerging technique, namely, EM algorithm is always be used for maximum likelihood
independent component analysis (ICA) is used for modelling. estimation by repeating E-step and M-step (Ge, 2016). In
ICA aims to find those statistically independent components addition, the desired probabilistic model is decided by the
that are non-Gaussian through a separating matrix (shown assumption of noise variance (Kim and Lee, 2003). The most
in Figure 4), which is different form PCA. For this reason, application on probabilistic model is to deal with the noised
the non-Gaussianity is the fundamental constraint, which data (Zhao et al., 2015; Raveendran et al., 2018; Zheng and
means without this restriction, the estimation is impossible at Song, 2018).
80 W. Guo et al.

Figure 5 The principle of FA (two factors) (see online version Grueiro et al., 2019). With the volume minimum principle
for colours) of a hypersphere in feature spaces, the spherically shaped
X boundary of a dataset can be estimated for detecting the novel
Noise data or outliers. The most application related to SVDD is the
F2 model development for process monitoring (Ge et al., 2017).

α1 Figure 6 The principle of CCA (see online version for colours)



Unit
α2 circle
Maximum
F1 X correlation Y

(5) Canonical correlation analysis (CCA) X* Y*

As the supervised learning method for process monitoring, PT QT


CCA projects original space dimensions of data onto the
most correlated space between X and Y (shown in Figure 6)
(Zhu et al., 2017). Thus, both Tx2 and Ty2 are required for Figure 7 The principle of SVM (see online version for colours)
judgement. Among them, the first index is used for examining
whether a fault occurs in X by referring the correlated Y
information from Y . Similarly, the second index examines
Hyperplane
the fault in Y by referring the correlated information from
X (Jiang et al., 2019). However, in order to obtain the key
Support vectors
variable, the predictive model has to get a high accuracy
extremely, or the key variable is assumed to be online
available or measurable without a large time delay (Liu et al.,
2018b). Due to lack of attention to variance analysis, CCA
is not as popular as other data-driven methods. However, it
is still an optional modelling method for industrial processes
modelling (Ogura, 2010; Pilario and Cao, 2018). Margin
X
3.1.2 Nonlinear models
(1) Support vector model (2) Artificial intelligence (AI) techniques
The SVM and the support vector regression (SVR) are two As the development of modern computers, artificial
well known nonlinear supervised learning methods. Among intelligence (AI) techniques have been extensively used
which, the SVM is associated with binary classification in several different fields, including process modelling,
problems, and the SVR is most used for regression. Generally, computer vision, natural language processing, and
without the consideration of constrained differences in bioinformatics, etc (Gopakumar et al., 2018). The basic unit
optimisation objective function, SVM and SVR can be unified of AI networks is named as neurons, which is inspired by
as SVM (Kadlec et al., 2009). The essence of SVM is to find a biological neural systems. All neurons and the associated
best-fitting function, and a search of the decision hyperplane input-output are connected by numerical weights directly.
is developed for splitting a given dataset into two sub-datasets By tuning the value of each weight, make them adapt to
(Shown in Figure 7). The fundamental advantage is that inputs and be capable of learning like human brains. The
the SVM utilises the precept of structural risk minimisation AI technique is a family of methods contained several
(SRM) to balance the complexity of the approximating architectures of the network, such as feature extraction
function against the data quality. Meanwhile, a suitable network (FEN), multi-layers perceptron (MLP), and recurrent
optimisation strategy can overcome the shortage in dealing neural network (RNN).
with large dataset modelling (Guo et al., 2016). Kaneko and FEN is a type of unsupervised learning method for
Funatsu (2014) developed an online soft sensor for plants producing a nonlinear feature space from the original space
with various states, Pooyan et al. (2015) separated each of data. The application related to FEN also includes several
faulty class from the others and detected the unknown space methods: Liukkonen et al. (2013) developed a self-organising
between those faults, Ben Ishak (2016) made a comparative map (SOM) driven based approach for monitoring industrial
study of variable selection by using SVM and RF, etc. wastewater treatment. Shang et al. (2014) pre-trained a series
Besides, by compressing the binary-class into one- of process data weights using the deep belief network (DBN),
class, the SVM is specified as the support vector data which is regarded as the initial weight of the subsequent
description (SVDD). It is considered as a special form of supervised learning for product quality prediction in the
SVM for dealing with unsupervised problems (Quiñones- crude-oil distillation unit. Wu and Zhao (2018) presented a
A review on data-driven approaches for industrial process modelling 81

fault diagnosis algorithm for the Tennessee Eastman (TE) includes model division (Khediri et al., 2012; Zhai et al.,
process, which uses the convolutional neural network (CNN) 2018), soft sensor (Yuan et al., 2014a; Zhang et al., 2017),
for feature extraction. and process monitoring (Fan et al., 2014; Fezai et al., 2016,
The MLP is an extension of traditional artificial neural 2018), etc.
network (ANN) with several hidden layers. Liukkonen et al.
(2012) built a dynamic MLP model used for the nitrogen Figure 9 The principle of three nonlinear LVP models (see online
oxide content estimation of the flue gas. Souza et al. (2013) version for colours)
utilised MLP for variable selection, which is assumed that
the mean square errors of two identified results with different
inputs are similar. X
RNN is a time-correlating neural work model, which is
reflected in the memory networks. Such mnemonic ability
in RNN is incorporated with the state of neurons, which is
stored and used for the prediction of the next state sequence
(Smith and Jin, 2014). Sun and Ge (2019) developed a soft
sensor model using Gaussian-Bernoulli restricted Boltzmann
machine and RNN on carbon dioxide absorption column. Xe T X Tp Xk T
Duchanoy et al. (2017) used RNN successfully predict the
area of car tyres contacted with the ground.
In addition, the relationship of above techniques is shown
PT PT PT
in Figure 8.
Input extension Polynomial Kernel LVP
model projection model model
Figure 8 The relationship between FEN, MLP, and RNN
(see online version for colours)

Features
(4) Gaussian process regression (GPR)
MLP
Y
FEN The GPR is inspired by the concept of Gaussian process
RNN (GP), which is used for representing a distribution over
functions from the viewpoint of the normal distribution.
Statistically, a sample in original spaces is associated with a
Previous neurons Current neurons
random variable that obeys a Gaussian distribution. The GP
can be seen as a generalised infinite-dimensions functional
X
space of multivariate normal distributions (Rasmussen and
Williams, 2006). The inference of continuous variables with
a GP prior in Bayesian framework is known as GPR.
(3) Nonlinear latent variable projection (NLVP) model By referencing the distribution over functional space and
Bayesian inference, the posterior distribution of requested
In order to extend the linear LVP model to the nonlinear estimation is provided (shown in Figure 10). Meanwhile,
applications, there are three main methods, the input such distribution estimation is a crucial difference between
extension model, the polynomial projection model, and the GPR and other methods. In recent years, GPR has
the kernel LVP model (shown in Figure 9). The input been mostly used for applications on the predictive purpose,
extension is the simplest model of the nonlinear latent such as soft sensing modelling, output or quality related
variable projection (NLVP). It extends the variable space monitoring; Zhou et al. (2015) developed an adaptive quality
with associated nonlinear combinations (such as square monitoring model with the use of predictive errors from the
values, cross-products, logarithms) of the original input. GPR in the fed-batch process; Liu et al. (2017) made a
Then the linear LVP analysis on the extended data space is multi-step prediction using GPR for removing pollutants in
performed (Baffi et al., 1999). Wold et al. (1989) presented wastewater treatment plants; DuyTrinh et al. (2019) presented
the polynomial projection algorithm, in which, inner relations a fuzzy neural network combined GPR model for predicting
between LVs were modified as a nonlinear (polynomial) the grinding wheel wear and surface roughness of alloy.
relationship. Due to the lack of prior knowledge related to Besides, Wang and Mao (2019) also extended the GPR for
underlying nonlinear relationships, these two methods are not outlier detection.
getting as much attention as the kernel LVP.
The kernel LVP (KLVP) retains the linear framework of
the traditional LVP. It firstly maps the original inputs onto a
3.1.3 Adaptive models
feature space, namely, the reproducing kernel Hilbert space (1) Offset-based calibration
(RKHS). The RKHS can be treated as a generalised space
form in SVM and GPR (Rosipal and Trejo, 2001). Then the The output offset comes from suspecting data reliability
LV can be obtained on such functional space. The kernel or grade-drifting caused by aging sensors. The offset-based
based method is widely used in industrial processes, which calibration technique uses the deviation between the output
82 W. Guo et al.

feedback and the online prediction to fix the upcoming relevance under the current working condition. Commonly,
predictive output (Ahmed et al., 2009). Therefore, this a window with a fixed-length containing the most recent
method is mainly used for applications under supervised measurements is employed, which ensures the maximum
learning frameworks. For example, Ni et al. (2011) used relevance to the current condition (Kadlec et al., 2011). In the
the output basis robustly update the GPR model by adding MW, the newest samples are received and the same number
an offset smoother; Ahmed et al. (2015) used the offset- of oldest measurements is excluded as the window slides
based model into the least squares support vector machine along the record sequence. Then the identified model needs
(LSSVM) to predict the nitrogen oxide discharged from the to be recalculated as well (Fezai et al., 2016; Jiang and Yan,
coal-fired power plant. 2013). Sometimes, it is unnecessary to build a new model
immediately when the window has been slid. Therefore, the
Figure 10 Illustration of GPR for the posterior distributions hypothesis-testing scheme could make a decision whether the
(see online version for colours) model needs to be modified (Shao and Tian, 2015; Yao and
Ge, 2017).
5
True function Confidence
interval 3.2 Different model structures
4 Training data

3.2.1 Switch-based model


3
In switch-based models, several specified amounts of sub-
2 models are firstly constructed from the training set. And
each sub-model corresponds with a steady mode condition,
Y

1 which also includes a membership function. Then, a final


model is obtained by choosing the maximum membership
0 from all sub-models (shown in Figure 11). For data-driven
models, the sub-model is always unknown for process
−1 modelling, and it is essential to separate the database into
several exclusive sub-sets by using clustering algorithms.
−1.5 −1 −0.5 0 0.5 1 1.5 Subsequently, those measurements belonging to the same
X set have similarities among them and differences from the
measurements belonging to the other sets (Quiñones-Grueiro
et al., 2019).
(2) Recursive method
Figure 11 The framework of switch-based model (see online
The recursive method uses the previously obtained model version for colours)
and the current feedback (including sample-wise and block-
Sub-model
wise operations) to update the model, which avoids a 1
recalculation and the modelling cost. There are two widely
used strategies for the model maintenance, i.e., the direct Sub-model
2
calculation approach and the data replacement approach. Final
model
In direct calculation approaches, the desired parameter is Training set Sub-model
obtained directly by adding the old parameters with an 3

incremental term. The typical method of such approach


includes the recursive least squares (RLS) (LeBreux et al., Sub-model
2012), the recursive Bayesian parameter estimation (Jing M

et al., 2017), and the Kalman filter (Ma et al., 2009). In


data replacement approaches, continuously accumulate data
can be replaced by the equivalent matrix stored by the The k-means method generates several measurement clusters
last iteration step. In this way, it unnecessary to store the with some criteria such as Euclidean distance, Mahalanobis
original data, such as recursive PCA (RPCA) (Hu et al., distance, or angle, which is widely used for switch-based
2014), recursive PLS (RPLS) (Poerio and Brown, 2018) and models, for example: Wang et al. (2012) utilised k-means
recursive GPR (Zhou et al., 2015). In addition, the forgetting algorithm for the model division and a two-step analysis of
factor can also be used as the definition of adaptive strength ICA-PCA is introduced for furnace temperature monitoring
for recursive models (Kadlec et al., 2011). in continuous annealing line; Khediri et al. (2012) separated
the nonlinear process model by using kernel k-means
(3) Moving window (MW) approach clustering; Du et al. (2017) defined a new similarity index
between different models using k-means clustering. Besides
The moving window (MW) strategy is considered as a the k-means clustering technique, the piecewise affine (PWA)
case, in which the old model is calibrated using a set of model computes the affine map that by using identified
measurements. Those measurements contain the maximum parameters of each sub-model and corresponding regressor
A review on data-driven approaches for industrial process modelling 83

space (Bemporad et al., 2003; Breschhi et al., 2016). Both k- one time or zero, which leads to the diversity of samples in
means and PWA require the specified cluster number at the each sub-model (shown in Figure 13).
initialisation step. Alternatively, the cluster number also can
be determined during the clustering step by using the affinity Figure 13 The framework of ensemble learning model
propagation (AP) clustering algorithm (Liang et al., 2019; (see online version for colours)
Zhang et al., 2018), or the subtractive clustering algorithm
(Pan et al., 2010; Norhayati and Rashid, 2018). Nonduplicate
data extraction
Base ω1
3.2.2 Mixture model model 1

Base ω2
Different from the switch-based division strategy, the mixture model 2

model converts those weighted sub-outputs into the final Training set
Base ω3
output with a linear superposition (shown in Figure 12). model 3

The main issue of the mixture model is how to assign Re-sampling


Base ωM ∑ Final
the weight to each sub-model. Several mixture algorithms model M
model

have been presented including different assignment rules.


For example, the fuzzy clustering method is an extension of
k-means, in which the sample has a series of membership The ensemble learning is especially applicable for sensitive
values to match each sub-model (Quiñones-Grueiro et al., models (Ge and Song, 2014). The bootstrap aggregation
2019). Moreover, by taking different distance spaces into (bagging) and the AdaBoost family of algorithms (boosting)
consideration, Gustafson-Kessel (GK) clustering (Kim et al., are two popular techniques. Bagging creates several
2004; Teslić et al., 2011) and the fuzzy c-means (FCM) individuals to construct an ensemble model, in which, every
clustering (Askari et al., 2017; Qi et al., 2014) are specified. classifier is trained with a random redistribution of the
dataset (Breiman, 1996). Boosting combines the rough and
Figure 12 The framework of mixture model (see online version moderately inaccurate rules of thumb in order to produce
for colours) an accurate rule for prediction, which provides a general
ω1 and provably effective predictive model (Cao et al., 2010).
Sub-model
1 Besides the bagging and boosting methods, the ensemble
method on the random subspace carries out the re-sampling
Sub-model ω2
2
though the direction of variables (or feature space) related
Final
∑ to the input data (Ge and Song, 2014; Jadhav et al., 2014).
Training set Sub-model ω3 model
Nowadays, ensemble learning has become a more generalised
3
model than the defined structure and the data collection of
ωM each individual model is becoming more and more flexible
Sub-model
M (Poerio and Brown, 2018; Shao and Tian, 2015).

3.2.4 Local learning model


Additionally, the Gaussian mixture model (GMM) refers to
a probabilistic model, in which, the distribution of a dataset Local learning also termed as lazy learning or just-in-time
is constructed as the composite model by using a finite learning (JITL) is an instance-based learning algorithm,
number of local Gaussian distributions. Subsequently, the which is mostly used for supervised learning (Cheng and
membership value is replaced by the posterior probability Chiu, 2004). Different from the traditional modelling method,
of observations to each sub-model through the Bayesian local learning builds a disposable dynamic model upon
framework. Considering a sufficient amount of Gaussian the query (shown in Figure 14). Such disposable model is
components and their linear combinations, the GMM could consisted of selected relevant samples by using k-nearest
approximate to almost any continuous density (Yuan et al., neighbours (kNN) criterion. Then, the predictive output of the
2014b). Thus, the GMM shows strong ability in complicated query is obtained depending on whether the local model is
industrial processes (Frigieri et al., 2016; Peng et al., 2015). used for regression or classification (Ge et al., 2017).
Besides, the GMM is also helpful in the non-Gaussian data Local learning provides an attractive alternative method
processing (Ge et al., 2017). for nonlinear process modelling by extracting the local linear
model from the training set. For example, Cheng and Chiu
3.2.3 Ensemble learning model (2005) utilised the residual between local model predictions
and process outputs to monitor the reaction process of
Ensemble learning is a powerful method, which integrates continuous stirred tank reactors (CSTR); Liu et al. (2012)
several sub-models to generate the final model. This method proposed a cumulative similarity factor in the local model,
seems to be equivalent to the mixture model as discussed which includes a weighted similarity of query and a size
before. In practice, the ensemble learning is a type of random of the relevant set; Hu et al. (2013) used the local kernel
re-sampling algorithm rather than model division techniques space for the batch process monitoring; Su et al. (2016)
(Cao et al., 2010). One sample could be extracted more than proposed the local state-space model for representing the
84 W. Guo et al.

high nonlinearity in batch processes; Yuan et al. (2017) 4.1 Practical applications
constructed a local weighted log-likelihood function to
approximate the nonlinear relationship of data; Zheng et al. Challenges of the development from theoretical researches to
(2018) developed a semi-supervised soft sensing model by practical applications are still urgently needed to be addressed
extracting the information from the local model including (Quiñones-Grueiro et al., 2019). This is mainly because
both labelled and unlabelled data. the practical application is more like a stochastic discrete
event system on aspects of multi-rate sampling, missing
Figure 14 The framework of instance-based local model data, dynamic character, and maintenance frequency. Such
(see online version for colours) uncertain elements should be put in place for the development
of process monitoring, soft sensing, fault diagnosis, etc. From
the practical viewpoint, the proposal of process modelling
should handle this issue currently rather than waiting for
Training set
offline analysis.
Instance-based model
4.2 Data mining and analysing
Local
Query Relevant model Output
data With the increasing requirement of the product quality,
modern plant-wide processes are becoming more and more
complex. Most of modelling techniques are limited to product
quality prediction, normal working condition monitoring,
fault classification, and model division. However, tracing
back to the root cause of process changes is still difficult.
Therefore, a cause-to-effect relationship among the dataset
3.2.5 Transfer learning model should be considered for further analysis, which reveals
As a sub-field of machine learning techniques, transfer the root cause of process changes and directs the process
learning is used to deal with the target domain without or little operation. Although several methods have been used, e.g.,
amount of labelled data, which is disabled for training any Bayesian network (Gonzalez et al., 2015) and relative
model. A transferred model can be constructed by permitting importance analysis (Yan et al., 2017), investigations for the
the difference in distributions, domains, and tasks in training application in plant-wide processes are still needed to be
and testing phase (Pan and Yang, 2010). Of cause, it should executed (Ge, 2017).
be on the assumption that enough samples are contained in
at least one related source domain, which provides a primary 4.3 Diversified data
basis for modelling and associated information extraction.
As the coming era of big data, massive volumes of data are
Additionally, the proper identification of the source domain is
collected from industrial processes. Besides the depth of data
essential as the sample obtained from dramatically different
mining, the diversification of collected data is also needed
source domains may adversely affect the training process,
to be paid attention, such as spectral data, image data, audio
which is termed as negative transfer (Salaken et al., 2017).
data, video data, and so on. Basically, the diversified data
Transfer learning is mostly used in image recognition and
still contains information related to the nature of industrial
text classification, etc. (Pan and Yang, 2010). Until recent
processes. Meanwhile, such data is much easier and faster
years, it has been introduced to the process modelling (Facco
to collect. However, how to combine the traditional data and
et al., 2012, 2014; Kang, 2018; Liu et al., 2019b; Shao et al.,
the diversified data, or how to utilise the diversified data
2019; Tomba et al., 2014). Actually, the transfer learning
independently for process modelling is a great challenge for
is a kind of domain-based adaptation method with a model
future works.
trained on the source domain and calibrated on the target
domain (Kang, 2018; Salaken et al., 2017).
4.4 Economic model
Due to the factor of sustainability, recently, not only
4 Perspectives for future research for researches of energy efficiency, but also for matters
related to environmental protection have been taken into
consideration for process modelling (Hanes and Bakshi,
Over the past decades, data-driven approaches used for
2015). As mentioned in early section, modelling methods
industrial processes have continually caught widespread
actually can be performed as optimisation problems. By
attention from both academics and engineers, which takes
introducing new constraints inspired by the economic model
the benefit from the distributed control systems (DCS) and
predictive control (EMPC) (Ellis et al., 2014), both energy
the power of modern computers. Meanwhile, more and more
efficiency and environmental sustainability of the process
new developed data-driven algorithms have been introduced
should be monitored as well. To this end, how to effectively
to cooperate new industrial characteristics. However, some
incorporate the process knowledge into industrial processes
fundamental issues related to data-driven approaches should
needs more consideration in future works.
be considered for process modelling.
A review on data-driven approaches for industrial process modelling 85

5 Conclusion Bao, L., Yuan, X.F. and Ge, Z.Q. (2015) ‘Co-training partial least
squares model for semi-supervised soft sensor development’,
In this paper, several most widely used machine learning Chemometrics and Intelligent Laboratory Systems, Vol. 147,
pp.75–85.
algorithms for industrial process modelling are reviewed,
Bemporad, A., Garulli, A., Paoletti, S. and A, V. (2003) ‘A greedy
with the methodology, the mathematical method, and the
approach to identification of piecewise affine models’, Hybrid
structure selection. Discussions related to extensions and
Systems: Computation and Control, Vol. 2623, pp.97–112.
challenges are also carried out for future researches. Actually,
Ben Ishak, A. (2016) ‘Variable selection using support vector
data-driven modelling is a field relying on information regression and random forests: a comparative study’, Intelligent
fusion. From the perspective of algorithms, none of them Data Analysis, Vol. 20, pp.83–104.
can be treated as a universal method for addressing any Bishop, C.M. (2006) Pattern Recognition and Machine Learning,
industrial situation. From the perspective of applications, Springer-Verlag, New York.
similarly, one modelling task may need several algorithms to Breiman, L. (1996) ‘Bagging predictors’, Machine Learning,
be implemented. Therefore, the data-driven method has an Vol. 24, No. 2, pp.123–140.
important potential for practical applications and academic Breschhi, V., Piga, D. and Bemporad, A. (2016) ‘Piecewise
researches, which plays an important role in industrial affine regression via recursive multiple least squares and
processes. multicategory discrimination’, Automatica, Vol. 70, pp.155–
162.
Cao, D.S., Xu, Q.S., Liang, Y.Z., Zhang, L.X. and Li, H.D. (2010)
Acknowledgement ‘The boosting: A new idea of building models’, Chemometrics
and Intelligent Laboratory Systems, Vol. 100, pp.1–11.
Cheng, C. and Chiu, M.S. (2004) ‘A new data-based methodology
The project is funded in part by the National Natural Science
for nonlinear process modeling’, Chemical Engineering
Foundation under Grant 61873113, Key R&D Program of
Science, Vol. 59, pp.2701–2810.
Jiangsu Province, China under Grant BE2018370, and the
Cheng, C. and Chiu, M.S. (2005) ‘Nonlinear process monitoring
Postgraduate Research and Practice Innovation Program of
using JITL-PCA’, Chemometrics and Intelligent Laboratory
Jiangsu Province (KYCX17 1785). Systems, Vol. 76, pp.1–13.
Chiplunkar, R. and Huang, B. (2019) ‘Output relevant slow feature
extraction using partial least squares’, Chemometrics and
References Intelligent Laboratory Systems, Vol. 191, pp.148–157.
Ding, Z.Y., Zhang, J. and Liu, Y. (2017) ‘Ensemble non-gaussian
Abyaneh, H.Z. (2014) ‘Evaluation of multivariate linear regression local regression for industrial silicon content prediction’, ISIJ
and artificial neural networks in prediction of water quality International, Vol. 57, No. 11, pp.2022–2027.
parameters’, Journal of Environmental Health Science and Du, W.Y., Fan, Y.P. and Zhang, Y.W. (2017) ‘Multimode process
Engineering, Vol. 12, p.40. monitoring based on data-driven method’, Journal of the
Franklin Institute-Engineering and Applied Mathematics,
Ahmad, I., Kano, M., Hasebe, S., Kitada, H. and Murata, N. (2014)
Vol. 345, No. 6, pp.2613–2627.
‘Gray-box modeling for prediction and control of molten steel
temperature in Tundish’, Journal of Process Control, Vol. 24, Duchanoy, C.A., Moreno-Armendariz, M.A., Urbina, L., Cruz-
Villar, C.A., Calvo, H. and Rubio, J.D. (2017) ‘A novel
No. 4, pp.375–382.
recurrent neural network soft sensor via a differential evolution
Ahmed, F., Cho, H.J., Kim, J.K., Seong, N.U. and Yeo, Y.K. (2015) training algorithm for the tire contact patch’, Neurocomputing,
‘A recursive pls-based soft sensor for prediction of the melt Vol. 235, pp.71–82.
index during grade change operations in hdpe plant’, Korean DuyTrinh, N., Yin, S.H., Tan, N.N., Son, P.X. and Duc, L.A. (2019) ‘
Journal of Chemical Engineering, Vol. 32, No. 6, pp.1029– ‘A new method for online monitoring when grinding Ti-6Al-4V
1036. alloy’, Materials and Manufacturing Processes, Vol. 34, No. 1,
Ahmed, F., Nazir, S. and Yeo, Y.K. (2009) ‘A recursive pls-based pp.39–53.
soft sensor for prediction of the melt index during grade Ellis, M., Durand, H. and Christofides, P.D. (2014) ‘A tutorial review
change operations in HDPE plant’, Korean Journal of Chemical of economic model predictive control methods’, Journal of
Engineering, Vol. 26, No. 1, pp.14–20. Process Control, Vol. 24, No. 8, pp.1156–1178.
Askari, S., Montazerin, N., Fazel Zarandi, M.H. and Hakimi, Eyvazian, M., Noorossana, R., Saghaei, A. and Amiri, A. (2011) ‘
E. (2017) ‘Generalized entropy based possibilistic fuzzy c- ‘Phase II monitoring of multivariate multiple linear regression
means for clustering noisy data and its convergence proof’, profiles’, Quality and Reliability Engineering International,
Vol. 27, No. 3, pp.281–296.
Neurocomputing, Vol. 219, pp.186–202.
Facco, P., Largoni, M., Tomba, E., Bezzo, F. and Barolo, M. (2014)
Back, A.D. and Weigend, A.S. (1997) ‘A first application of
‘Transfer of process monitoring models between plants: Batch
independent component analysis to extracting structure from
systems’, Chemical Engineering Research and Design, Vol. 92,
stock returns’, International Journal of Neural Systems, Vol. 8, No. 2, pp.273–284.
No. 4, pp.473–484.
Facco, P., Tomba, E., Bezzo, F. and Garcı́a-Muñoz, S. (2012)
Baffi, G., Martin, E.B. and Morris, A.J. (1999) ‘Non-linear ‘Transfer of process monitoring models between different
projection to latent structures revisited: the quadratic pls plants using latent variable techniques’, Industrial and
algorithm’, Computers and Chemical Engineering, Vol. 23, Engineering Chemistry Research, Vol. 51, No. 21, pp.7327–
pp.395–411. 7339.
86 W. Guo et al.

Fan, J.C., Qin, S.J. and Wang, Y.Q. (2014) ‘Online monitoring of Huang, J.Y. and Zhao, J. (2018) ‘Identification of multi-model lpv
nonlinear multivariate industrial processes using filtering kica- model with two scheduling variables using transition test’,
pca’, Control Engineering Practice, Vol. 22, pp.205–216. International Journal of Modelling, Identification and Control,
Fezai, R., Mansouri, M., Taouali, O., Harkat, M.F. and Bouguila, Vol. 29, No. 1, pp.31–43.
N. (2018) ‘Online reduced kernel principal component analysis Hyvärinen, A. and Oja, E. (2000) ‘Independent component analysis:
for process monitoring’, Journal of Process Control, Vol. 61, algorithms and applications’, Neural networks, Vol. 13,
pp.1–11. Nos. 4–5, pp.411–430.
Fezai, R., Taouali, O., Harkat, M.F. and Bouguila, N. (2016) ‘A Jadhav, S., Nalbalwar, S. and Ghatol, A. (2014) ‘Feature elimination
new fault detection method for nonlinear process monitoring’, based random subspace ensembles learning for ecg arrhythmia
International Journal of Advanced Manufacturing Technology, diagnosis’, Soft Computing, Vol. 18, No. 3, pp.579–587.
Vol. 87, No. 9-12, pp.3425–3436. Jiang, Q.C. and Yan, X.F. (2013) ‘Weighted kernel principal
Frigieri, E.P., Campos, P.H.S., Paiva, A.P. and Balestrassi, P.P. component analysis based on probability density estimation
(2016) ‘A mel-frequency cepstral coefficient-based approach and moving window and its application in nonlinear chemical
process monitoring’, Chemometrics and Intelligent Laboratory
for surface roughness diagnosis in hard turning using acoustic
Systems, Vol. 127, pp.212–131.
signals and gaussian mixture models’, Applied Acoustics,
Vol. 113, pp.230–237. Jiang, Q.C., Yan, X.F. and Huang, B. (2016) ‘Performance-driven
distributed PCA process monitoring based on fault-relevant
Ge, Z.Q. (2016) ‘Supervised latent factor analysis for process
variable selection and bayesian inference’, IEEE Transactions
data regression modeling and soft sensor application’, IEEE
on Industrial Electronics, Vol. 63, No. 1, pp.377–386.
Transactions on Control Systems Technology, Vol. 24, No. 3,
Jiang, Q.C., Yan, X.F. and Huang, B. (2019) ‘Review and
pp.1004–1011.
perspectives of data-driven distributed monitoring for industrial
Ge, Z.Q. (2017) ‘Review on data-driven modeling and monitoring plant-wide processes’, Industrial and Engineering Chemistry
for plant-wide industrial processes’, Chemometrics and Research, Vol. 58, pp.12899–12912.
Intelligent Laboratory Systems, Vol. 171, pp.16–25.
Jing, S.X., Pan, T.H. and Li, Z.M. (2017) ‘Variable knot-based
Ge, Z.Q. and Song, Z.H. (2012) ‘Multivariate statistical process spline approximation recursive bayesian algorithm for the
monitoring using modified factor analysis and its application’, identification of wiener systems with process noise’, Nonlinear
Journal of Chemical Engineering of Japan, Vol. 45, No. 10, Dynamics, Vol. 90, No. 4, pp.2293–2303.
pp.829–839. Kadlec, P., Gabrys, B. and Strandt, S. (2009) ‘Data-driven soft
Ge, Z.Q. and Song, Z.H. (2014) ‘Ensemble independent component sensors in the process industry’, Computers and Chemical
regression models and soft sensing application’, Chemometrics Engineering, Vol. 33, pp.795–814.
and Intelligent Laboratory Systems, Vol. 130, pp.115–122. Kadlec, P., Grbic, R. and Gabrys, B. (2011) ‘Data-driven soft sensors
Ge, Z.Q., Song, Z.H., Ding, S.X. and Huang, B. (2017) ‘Data mining in the process industry’, Computers and Chemical Engineering,
and analytics in the process industry: The role of machine Vol. 35, No. 1, pp.1–24.
learning’, IEEE Access, Vol. 5, pp.20590–20616. Kaneko, H. and Funatsu, K. (2014) ‘ ‘Adaptive soft sensor based
Geladi, P. and Kowalski, B.R. (1986) ‘ ‘Partial least-squares on online support vector regression and bayesian ensemble
regression: a tutorial’, Analytica Chimica Acta, Vol. 185, learning for various states in chemical plants’, Chemometrics
pp.1–17. and Intelligent Laboratory Systems, Vol. 137, pp.57–66.
Gonzalez, R., Huang, B. and Lau, E. (2015) ‘Process monitoring Kaneko, H. and Funatsu, K. (2016) ‘ ‘Ensemble locally weighted
using kernel density estimation and bayesian networking with partial least squares as a just-in-time modeling method’,
an industrial case study’, ISA Transactions, Vol. 58, pp.330– AICHE Journal, Vol. 62, No. 3, pp.717–725.
347. Kang, S. (2018) ‘On effectiveness of transfer learning approach
Gopakumar, V., Tiwari, S. and Rahman, I. (2018) ‘A deep learning for neural network-based virtual metrology modeling’, IEEE
based data driven soft sensor for bioprocesses’, Biochemical Transactions on Semiconductor Manufacturing, Vol. 31, No. 1,
pp.149–155.
Engineering Journal, Vol. 136, pp.28–39.
Khatibisepehr, S., Huang, B. and Khare, S. (2013) ‘Design of
Guo, W., Pan, T.H. and Li, Z.M. (2016) ‘Development of a soft
inferential sensors in the process industry: A review of bayesian
sensor for processes with multiple operating regimes using
methods’, Journal of Process Control, Vol. 23, pp.1575–1596.
adaptive multi-state partial least squares regression’, Journal of
the Taiwan Institute of Chemical Engineers, Vol. 67, pp.20–28. Khediri, I.B., Weihs, C. and Liman, M. (2012) ‘Kernel k-means
clustering based local support vector domain description
Hanes, R.J. and Bakshi, B.R. (2015) ‘ ‘Sustainable process design fault detection of multimodal processes’, Expert Systems with
by the process to planet framework’, AICHE Journal, Vol. 61, Applications, Vol. 39, No. 2, pp.2166–2171.
No. 10, pp.3320–3331.
Kim, D. and Lee, I.B. (2003) ‘Process monitoring based on
Hapfelmeier, A. and Ulm, K. (2013) ‘ ‘A new variable selection probabilistic PCA’, Chemometrics and Intelligent Laboratory
approach using random forests’, Computational Statistics and Systems, Vol. 67, pp.109–123.
Data Analysis, Vol. 60, pp.50–69.
Kim, Y.I., Kim, D.W., Lee, D. and Lee, K.H. (2004) ‘A cluster
Hu, Y., Ma, H.H. and Shi, H.B. (2013) ‘Enhanced batch process validation index for GK cluster analysis based on relative
monitoring using just-in-time-learning based kernel partial degree of sharing’, Information Sciences, Vol. 168, Nos. 1–4,
least squares’, Chemometrics and Intelligent Laboratory pp.225–242.
Systems, Vol. 123, pp.15–27. LeBreux, M., Desilets, M. and Lacroix, M. (2012) ‘Control of
Hu, Z.K., snd W.H.Gui, Z.W.C. and Jiang, B. (2014) ‘Adaptive pca the ledge thickness in high-temperature metallurgical reactors
based fault diagnosis scheme in imperial smelting process’, ISA using a virtual sensor’, Inverse Problems in Science and
Transactions, Vol. 53, No. 5, pp.1446–1455. Engineering, Vol. 20, No. 8, pp.1215–1238.
A review on data-driven approaches for industrial process modelling 87

Lee, J.M., Qin, S.J. and Lee, I.B. (2007) ‘Fault detection of non- Norhayati, I. and Rashid, M. (2018) ‘Adaptive neuro-fuzzy
linear processes using kernel independent component analysis’, prediction of carbon monoxide emission from a clinical
Canadian Journal of Chemical Engineering, Vol. 85, No. 4, waste incineration plant’, Neural Computing and Applications,
pp.526–536. Vol. 30, No. 10, pp.3049–3061.
Liang, H.B., J, L.Z., Khan, M.J. and Han, J.X. (2019) ‘An sand Ogura, T. (2010) ‘ ‘A variable selection method in principal
plug of fracturing intelligent early warning model embedded canonical correlation analysis’, Computational Statistics and
in remote monitoring system’, IEEE Access, Vol. 7, pp.47944– Data Analysis, Vol. 54, No. 4, pp.1117–11237.
47954. Pan, S.J. and Yang, Q.A. (2010) ‘A survey on transfer learning’,
Liu, Y. and Zhang, G. (2015) ‘Scale-sifting multiscale nonlinear IEEE Transactions on Knowledge and Data Engineering,
process quality monitoring and fault detection’, Canadian Vol. 22, No. 10, pp.1345–1359.
Journal of Chemical Engineering, Vol. 93, No. 8, pp.1416– Pan, T.H., Wong, D.S.H. and Jang, S.S. (2010) ‘Development of a
1425. novel soft sensor using a local model network with an adaptive
Liu, Y., Gao, Z.L., Li, P. and Wang, H.Q. (2012) ‘Just-in-time kernel subtractive clustering approach’, Industrial and Engineering
learning with adaptive parameter selection for soft sensor Chemistry Research, Vol. 49, No. 10, pp.4738–4747.
modeling of batch processes’, Industrial and Engineering Peng, J.T., Peng, S.L. and Hu, Y. (2012) ‘Partial least squares
Chemistry Research, Vol. 51, No. 11, pp.4313–4327. and random sample consensus in outlier detection’, Analytica
Liu, Q., Qin, S.J. and Chai, T.Y. (2014) ‘Multiblock concurrent Chimica Acta, Vol. 719, pp.24–29.
pls for decentralized monitoring of continuous annealing Peng, K.X., Zhang, K., You, B. and Dong, J. (2015) ‘Quality-
processes’, IEEE Transactions on Industrial Electronics, related prediction and monitoring of multi-mode processes
Vol. 61, No. 11, pp.6429–6437. using multiple pls with application to an industrial hot strip
Liu, Y.Q., Pan, Y.P., Huang, D.P. and Wang, Q.L. (2017) mill’, Neurocomputing, Vol. 168, pp.1094–1103.
‘Fault prognosis of filamentous sludge bulking using an Piepho, H.P. (2009) ‘Ridge regression and extensions for
enhanced multi-output Gaussian processes regression’, Control genomewide selection in maize’, Crop Science, Vol. 49, No. 4,
Engineering Practice, Vol. 62, pp.46–54. pp.1165–1176.
Liu, R.S., Peng, M.F. and Xiao, X.H. (2018a) ‘Ultra-short-term Pilario, K.E.S. and Cao, Y. (2018) ‘Canonical variate dissimilarity
wind power prediction based on multivariate phase space analysis for process incipient fault detection’, IEEE
reconstruction and multivariate linear regression’, Energies, Transactions on Industrial Informatics, Vol. 14, No. 12,
Vol. 11, No. 10, pp.2763. pp.5308–5315.
Liu, Y.Q., Liu, B., Zhao, X.J. and Xie, M. (2018b) ‘A mixture Poerio, D.V. and Brown, S.D. (2018) ‘A frequency-localized
of variational canonical correlation analysis for nonlinear and recursive partial least squares ensemble for soft sensing’,
quality-relevant process monitoring’, IEEE Transactions on Journal of Chemometrics, Vol. 32, No. 5, pp.e2999.
Industrial Electronics, Vol. 65, No. 8, pp.6478–6486. Pooyan, N., Shahbazian, M., Salahshoor, K. and Hadian, M. (2015)
Liu, S.T., Gao, X.W., Qi, W.H. and Zhang, S.M. (2019a) ‘Soft sensor ‘Simultaneous fault diagnosis using multi class support vector
modelling of propylene conversion based on a takagi-sugeno machine in a dew point process’, Journal of Natural Gas
fuzzy neural network optimized with independent component Science And Engineering, Vol. 23, pp.373–379.
analysis and mutual information’, Transactions of the Institute Qi, C.K., Li., H.X., Li, S.Y., Zhao, X.C. and Gao, F. (2014) ‘A fuzzy-
of Measurement and Control, Vol. 41, No. 3, pp.737–748. based spatio-temporal multi-modeling for nonlinear distributed
Liu, Y., Yang, C., Liu, K.X., Chen, B.C. and Yao, Y. parameter processes’, Applied Soft Computing, Vol. 25,
(2019b) ‘Domain adaptation transfer learning soft sensor pp.309–321.
for productquality prediction’, Chemometrics and Intelligent Qin, S.J. (1998) ‘ ‘Recursive pls algorithms for adaptive data
Laboratory Systems, Vol. 192, p.103813. modeling’, Control Engineering Practice, Vol. 22, pp.503–514.
Liukkonen, M., Hälikkä, E., Hiltunen, T. and Hiltunen, Y. (2012) Qin, S.J. (2012) ‘Survey on data-driven industrial process
‘Dynamic soft sensors for NOx emissions in a circulating monitoring and diagnosis’, Annual Reviews in Control, Vol. 36,
fluidized bed boiler’, Environmental Modelling and Software, pp.220–234.
Vol. 97, pp.1483–490. Quiñones-Grueiro, M., Prieto-Moreno, A., Verde, C. and Llanes-
Liukkonen, M., Laakso, I. and Hiltunen, Y. (2013) ‘Advanced Santiago, O. (2019) ‘Data-driven monitoring of multimode
monitoring platform for industrial wastewater treatment: continuous processes: a review’, Chemometrics and Intelligent
Multivariable approach using the self-organizing map’, Laboratory Systems, Vol. 189, pp.56–71.
Environmental Modelling and Software, Vol. 48, pp.193–2016. Rasmussen, C.E. and Williams, C.K. (2006) Gaussian Processes for
Lu, L., Yan, J.H. and de Silva, C.W. (2016) ‘Feature selection Machine Learning, MIT Press, Cambridge, Mass.
for ecg signal processing using improved genetic algorithm Raveendran, R., Kodamana, H. and Huang, B. (2018) ‘Process
and empirical mode decomposition’, Measurement, Vol. 94, monitoring using a generalized probabilistic linear latent
pp.372–381. variable model’, Automatica, Vol. 96, pp.73–83.
Ma, M.D., Ko, J.W., Wang, S.J., Wu, M.F., Jang, S.S., Shieh, S.S. Rinnan, A., Andersson, M., Ridder, C. and Engelsen, S.B. (2014)
and Wong, D.S.H. (2009) ‘Development of adaptive soft sensor ‘Recursive weighted partial least squares (RPLS): an efficient
based on statistical identification of key variables’, Control variable selection method using pls’, Journal of Chemometrics,
Engineering Practice, Vol. 17, No. 9, pp.1026–1034. Vol. 28, No. 5, pp.439–447.
Ni, W.D., Tan, S.K. and Ng, W.J. (2011) ‘Recursive gpr for nonlinear Rosipal, R. and Trejo, L.J. (2001) ‘Kernel partial least squares
dynamic process modeling’, Chemical Engineering Journal, regression in reproducing kernel Hilbert space’, Journal of
Vol. 173, pp.636–643. Machine Learning Research, Vol. 2, No. 2, pp.97–123.
88 W. Guo et al.

Rothman, M.J., Rothman, S.I. and Beals, J. (2013) ‘Development Wang, R., Wang, X.Y., Sun, H. and Huang, Y.T. (2019a) ‘Analysis
and validation of a continuous measure of patient condition of estimator and energy consumption with multiple faults
using the electronic medical record’, Journal of Biomedical over the distributed integrated WSN’, International Journal
Informatics, Vol. 46, No. 5, pp.837–848. of Modelling, Identification and Control, Vol. 32, No. 2,
Salaken, S.M., Khosravi, A., Thanh, T. and Nahavandi, S. (2017) ‘ pp.154–168.
‘Extreme learning machine based transfer learning algorithms: Wang, X.D., Kvaal, K. and Ratnaweera, H. (2019b) ‘Explicit
a survey’, Neurocomputing, Vol. 267, pp.516–524. and interpretable nonlinear soft sensor models for influent
surveillance at a full-scale wastewater treatment plant’, Journal
Shang, C., Yang, F., Huang, D.X. and Lyu, W.X. (2014) ‘Data-driven
of Process Control, Vol. 77, pp.1–6.
soft sensor development based on deep learning technique’,
Journal of Process Control, Vol. 24, pp.223–233. Wold, S., Kettaneh-Wold, N. and Skagerberg, B. (1989) ‘Non-linear
PLS modelling’, Chemometrics and Intelligent Laboratory
Shao, S.Y., McAleer, S., Yan, R.Q. and Baldi, P. (2019) ‘Highly
Systems, Vol. 7, pp.53–65.
accurate machine fault diagnosis using deep transfer learning’,
IEEE Transactions on Industrial Informatics, Vol. 15, No. 4, Wu, H. and Zhao, J.S. (2018) ‘Deep convolutional neural network
model based chemical process fault diagnosis’, Computers and
pp.2446–2455.
Chemical Engineering, Vol. 115, pp.185–197.
Shao, W.M. and Tian, X.M. (2015) ‘Adaptive soft sensor for quality
Yan, Z.B., Kuang, T.H. and Yao, Y. (2017) ‘Multivariate fault
prediction of chemical processes based on selective ensemble
isolation of batch processes via variable selection in partial
of local partial least squares models’, Chemical Engineering
least squares discriminant analysis’, ISA Transactions, Vol. 70,
Research and Design, Vol. 95, pp.113–132.
pp.389–399.
Smith, C. and Jin, Y.C. (2014) ‘Evolutionary multi-objective Yao, L. and Ge, Z.Q. (2017) ‘Online updating soft sensor
generation of recurrent neural network ensembles for time modeling and industrial application based on selectively
series prediction’, Neurocomputing, Vol. 143, pp.302–311. integrated moving window approach’, IEEE Transactions
Souza, F. A.A., Araújo, R. and Mendes, J. (2016) ‘Review of soft on Instrumentation and Measurement, Vol. 66, No. 8,
sensor methods for regression applications’, Chemometrics and pp.1985–1993.
Intelligent Laboratory Systems, Vol. 152, pp.69–79. Yuan, X.F., Ge, Z.Q. and Song, Z.H. (2014a) ‘Locally weighted
Souza, F. A.A., Araújo, R., Matias, T. and Mendes, J. (2013) ‘ kernel principal component regression model for soft sensing of
‘A multilayer-perceptron based method for variable selection nonlinear time-variant processes’, Industrial and Engineering
in soft sensor design’, Journal of Process Control, Vol. 23, Chemistry Research, Vol. 53, No. 35, pp.13736–13749.
pp.1371–1378. Yuan, X.F., Ge, Z.Q. and Song, Z.H. (2014b) ‘Soft sensor model
Stoica, P., Selen, Y. and Jian, L. (2004) ‘Multi-model approach to development in multiphase/multimode processes based on
model selection’, Digital Signal Processing, Vol. 14, No. 5, gaussian mixture regression’, Chemometrics and Intelligent
pp.399–412. Laboratory Systems, Vol. 138, pp.97–109.
Su, Q.L., Hermanto, M.W., Braatz, R.D. and Yuan, X.F., Ge, Z.Q., Song, Z.H., Wang, Y.L., Yang, C.H. and
Chiu, M.S. (2016) ‘Just-in-time-learning based Zhang, H.W. (2017) ‘Soft sensor modeling of nonlinear
extended prediction self-adaptive control for batch industrial processes based on weighted probabilistic projection
processes’, Journal of Process Control, Vol. 43, regression’, IEEE Transactions on Instrumentation and
pp.1–9. Measurement, Vol. 66, No. 4, pp.837–845.
Sun, Q.Q. and Ge, Z.Q. (2019) ‘Probabilistic sequential network Yuan, X.F., Huang, B. and Wang, Y.L. (2018) ‘Deep learning-
based feature representation and its application for soft sensor
for deep learning of complex process data and soft sensor
modeling with variable-wise weighted sae’, IEEE Transactions
application’, IEEE Transactions on Industrial Informatics,
on Industrial Informatics, Vol. 14, No. 7, pp.3236–3243.
Vol. 15, No. 5, pp.2700–2709.
Zhai, L.R., Zhang, Y.W., Guan, S.P., Fu, Y.J. and Feng, L. (2018)
Tamboli, D. and Chile, R. (2018) ‘ ‘Multi-model approach for 2-
‘Nonlinear process monitoring using kernel nonnegative matrix
dof control of nonlinear cstr process’, International Journal
factorization’, Canadian Journal of Chemical Engineering,
of Modelling, Identification and Control, Vol. 30, No. 2, Vol. 96, No. 2, pp.554–563.
pp.143–161.
Zhang, S., Li, L.J., Yao, L.J., Yang, S.P. and Zou, T. (2018) ‘Data-
Teslić, L., Hartmann, B., Nelles, O. and Škrjanc, I. (2011) driven process decomposition and robust online distributed
‘Nonlinear system identification by gustafson- kessel fuzzy modelling for large-scale processes’, International Journal of
clustering and supervised local model network learning for the Systems Science, Vol. 49, No. 3, pp.449–463.
drug absorption spectra process’, IEEE Transactions on Neural Zhang, S.M., Zhao, C.H. and Gao, F.R. (2019) ‘Incipient fault
Networks, Vol. 22, No. 12, pp.1941–1951. detection for multiphase batch processes with limited batches’,
Tomba, E., Meneghetti, N., Facco, P. and Zelenkova, T. (2014) IEEE Transactions on Control Systems Technology, Vol. 27,
‘Transfer of a nanoparticle product between different mixers No. 1, pp.103–117.
using latent variable model inversion’, AICHE Journal, Zhang, X.M., Kano, M. and Li, Y. (2017) ‘Locally weighted
Vol. 2602, No. 1, pp.123–135. kernel partial least squares regression based on sparse
Wang, B. and Mao, Z.Z. (2019) ‘Outlier detection based on gaussian nonlinear features for virtual sensing of nonlinear time-varying
process with application to industrial processes’, Applied Soft processes’, Computers and Chemical Engineering, Vol. 104,
Computing, Vol. 76, pp.505–516. pp.164–171.
Wang, F.L., Tan, S., Peng, J. and Chang, Y.Q. (2012) ‘Process Zhang, Y. and Qin, S. (2007) ‘Fault detection of nonlinear processes
monitoring based on mode identification for multi-mode using multiway kernel independent component analysis’,
process with transitions’, Chemometrics and Intelligent Industrial and Engineering Chemistry Research, Vol. 46,
Laboratory Systems, Vol. 110, No. 1, pp.144–155. No. 23, pp.7780–7787.
A review on data-driven approaches for industrial process modelling 89

Zhao, L.P., Zhao, C.H. and Gao, F.R. (2014) ‘Regression Zhou, L., Chen, J.H. and Song, Z.H. (2015) ‘Recursive gaussian
modeling and quality prediction for multiphase batch processes process regression model for adaptive quality monitoring
with inner-phase analysis’, Chemometrics and Intelligent in batch processes’, Mathematical Problems in Engineering,
Laboratory Systems, Vol. 135, pp.1–16. p.761280.
Zhao, L., Peng, T., Xie, Y.F., Yang, C.H. and Gui, W.H. (2017) Zhu, J.L., Ge, Z.Q., Song, Z.H. and Gao, F.R. (2018) ‘Review and
‘Recognition of flooding and sinking conditions in flotation big data perspectives on robust data mining approaches for
process using soft measurement of froth surface level and qta’, industrial process modeling with outliers and missing data’,
Chemometrics and Intelligent Laboratory Systems, Vol. 169, Annual Reviews in Control, Vol. 46, pp.107–133.
pp.45–52. Zhu, Q.Q., Liu, Q. and Qin, S.J. (2017) ‘Concurrent quality
Zhao, Z.G., Li, Q.H., Huang, B., Liu, F. and Ge, Z.Q. (2015) and process monitoring with canonical correlation analysis’,
‘Process monitoring based on factor analysis: Probabilistic Journal of Process Control, Vol. 60, pp.95–103.
analysis of monitoring statistics in presence of both complete Zhu, Z.B., Huan, Z.H. and Palazoglu, A. (2011) ‘Transition
and incomplete measurements’, Chemometrics and Intelligent process modeling and monitoring based on dynamic ensemble
Laboratory Systems, Vol. 142, pp.18–27. clustering and multiclass support vector data description’,
Zheng, J.H. and Song, Z.H. (2018) ‘Semisupervised learning for Industrial and Engineering Chemistry Research, Vol. 50,
probabilistic partial least squares regression model and soft No. 24, pp.13969–13983.
sensor application’, Journal of Process Control, Vol. 64,
pp.123–131.
Zheng, W.J., Liu, Y., Gao, Z.L. and Yang, J.G. (2018) ‘Just-
in-time semi-supervised soft sensor for quality prediction
in industrial rubber mixers’, Chemometrics and Intelligent
Laboratory Systems, Vol. 180, pp.36–41.

You might also like