Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2016 12th World Congress on Intelligent Control and Automation (WCICA)

June 12-15, 2016, Guilin, China

Data Mining Applications for Finding Golden Batch Benchmarks and Optimizing
Batch Process Control
Yuelong Su and Fengqin Yu

operation strategy early enough to avoid out-of

Abstract Ă This article discusses MPCA (Multi-way
specification product resulted by batch-to-batch variations.
Principal Component Analysis) and MPLS (Multi-way
Partial Least Squares) have been used to compress the 1.1 Approaches for the Analysis of Batch Process Data
information into low-dimensional spaces and pinpoint the
root causes of batch-to-batch difference. From engineering As a data-driven model and statistical method, PCA
perspective, this paper focuses on applying MPCA and (Principal Component Analysis) and PLS (Partial Least
MPLS to data analysis of batch process combining with Squares) are widely used in engineering and science
operation experiences to find “golden batch benchmark” applications in general [4], and in particular for quality
that describes the best operation of historical batches. This monitoring [5]. Detail of PLS algorithms can be found in
work includes data pre-treatment, batch process modelling
Geladi and Kowalski [6] and Hoskuldsson [7]. Soft-sensing
and chemical reaction initiation status decision. Finally,
optimizing control strategies and batch process technique [8-9] and process fault detection [10-12] are
improvement are also been discussed. Process and control typical data-driven model applications in chemical industry
engineers are be able to obtain the valuable data analyzing and semiconductors etch process.
and control optimization methods for batch process from On one hand, talking about multi-way or time-series data
this study. of batch process, many research papers pointed out that
MPCA (Multi-way Principal Component Analysis) and
I. INTRODUCTION MPLS (Multi-way Partial Least Squares) had been used to

B atch process plays an important role in the compress the information into low-dimensional spaces and
production of polymers, pharmaceuticals and pinpoint the root causes of batch-to-batch difference
biochemical. For example, polyether polyols (PPG) [13-14], which extend the applications of PCA and PLS
and chloroprene rubber, it is a process of techniques from continuous processes to batch processes.
high-quality specialty materials and products. According to Therefore, batch process performance can be easily
batch control standard [1], a finite-duration batch process analyzed and monitored in the reduced space, quality
consists of one or more process stages which are organized predictions can also be made [15].
as an ordered set and a specified recipe of materials. Each On the other hand, talking about industrial experiences
process stage consists of an ordered set of one or more for batch process modelling and monitoring, the
process operations, these operations represent major multivariate analysis was performed on 20 fermentation
processing activities, including preparing reactor, charging batches from The Dow Chemical Company San Diego
and reacting. Each process operation can be subdivided into facility [2]; Batch scheduling with electric power
an ordered set of one or more process actions that carry out constraints (ABB) and simultaneous scheduling and
the processing required by the process operation. A range dynamic optimization of batch process (Dow Chemical)
of quality indexes can be measured at the quality control had also been researched in Carnegie Mellon University
laboratory for this batch final product. recently [16]; the multivariate SPC charts had been applied
In general, the batch process exhibits some batch-to-batch to monitor DuPont batch polymerization reactor [17].
variations because of errors in the charging of the recipe of 1.2 Challenges for Process or Control Engineers
materials, deviations of the process control between
operation point and set point, and disturbances to the From engineering point of view, challenges for process
manipulated variables. To achieve consistent product engineers or control engineers are understanding of the
quality from a batch process, minimizing batch-to-batch underlying phenomena that can be modelled rapidly to aid
variability is important [2], especially temperatures, us in rational decision making [18]. Chemical industry
pressures, agitation and feedrates are under the best and process modelling, especially for batch process, has been
stable controlled conditions. From engineering perspective, evolving into a specialized field, the in-depth knowledge
it is important for process or control engineers to mine we need includes the interface between chemical
multi-way data for finding golden benchmarks [3] and engineering, applied mathematics and computer science
optimize batch process control. The objective is to identify with specific model-based methods and tools. For process
the best operating procedure and correct abnormal or control engineers, these comprehensive abilities are as
their core competencies to deal with the inherent
complexity of chemical processes and the multi-objective
Yuelong Su are with senior engineer of China National nature of decision-making during the lifecycle of the
Bluestar (Group) Co., Ltd, Beijing, China (email: manufacturing process of chemical products [19-20]. Combing with the nature of multi-way batch process
Fengqin Yu are with senior engineer of Central Research data, which is the same as time-series data including large
Institute of China Chemical Science and Technology, Beijing, in data size, high dimensionality and necessary to update
China (email:

978-1-4673-8414-8/16/$31.00 ©2016 IEEE 1058

continuously [21], it is important for process or control
engineers to mine of time-series data with expanded depth
in history and breadth in location for event discovery, make
decision and analyze causality in the era of big data [22].
In details, how to apply MPCA and MPLS to process
data analysis on batch production, how to combine with
industrial experiences for finding golden benchmarks that
describe the best operation of past batches, and how to
optimize control strategies in DCS (Distributed Control
Systems) are solutions for facing these above mentioned
challenges. These are our focuses in this paper.
Fig. 2. Process operation (catalyzing reaction)
1.3 Research Method
In this process operation, three key process actions are
Based on our focuses, research framework and method required (Table 1).
are illustrated as below (Fig. 1).
Table 1. An ordered set of process actions

Command Controller Condition Mode

Charging FIC0607 SUM > SET Automatic
Heating TICA0610 Manual
Reacting PIC0605 PV = SV Automatic

2.2 Requirement Analysis

Fig. 1. Research framework An ordered set of key process actions that carry out the
processing required by the process operation (Table 1) is
In this framework, it’s including descriptions about a explained as illustrated below (valves are all shown in Fig.
batch process, multi-way data analysis and batch process 2):
modelling, optimizing control strategies and batch process Step 1: This is the charging state. If SUM value of
improvement are also been discussed. So this paper is controller FIC0607 is more than SET value, valve 1 could
arranged as below. Section II will discuss characteristics of be closed. Charging is finished and controlled
a batch process. The requirement analyses are the automatically.
foundation for modelling and optimizing. Section III
Step 2: This is the heating state. If PV value of controller
introduces the details about data pre-treatment methods for
TICA0610 is equal or greater than setting threshold, valve 2
batch process. Section IV is the most important contents of
(steam) could be closed and valve 3 (cooling water) could
this paper, MPCA and MPLS are used in engineering
be opened at the same time. Heating is finished and
practice combing with working experiments for finding
controlled by manual.
golden benchmarks. Section V presents details about
Step 3: This is the reacting state. In this state, process
optimizing control strategies and its applications into a real
operation need PV value of controller PIC0605 is equal to
batch process. Section VI and VII describes process
SV value, so valve 4 would be adjusted based on
improvement after applying control strategies. Finally, in
controller’s PID tuning automatically.
Section VIII, personal views about focuses from academia
According to discussed above, it is easily found that
and industry are discussed.
batch-to-batch variability in this process operation is
Process and control engineers can obtain the valuable caused by heating temperatures control without under the
data analyzing and control optimization methods for batch best and stable controlled conditions. In other words,
process from this study. different operators could control valve 2 and valve 3 in
II. OVERVIEW A BATCH PRODUCTION LINE different thresholds because of manual operation. For
solving this problem, the best batch reaction initiation
2.1 Process Operation status, namely golden benchmarks, should be discovered
under the help of MPCA and MPLS applied into mining
In this batch production line, process stage (Batch ID: multi-way process data.
DEP) consists of an ordered set of one or more process
operations, catalyzing reaction operation is shown as below 2.3 Data Acquirement
(Fig. 2). At present, process data historians are built around DCS
(Distributed Control Systems). Data historian systems, such
as PI Systems from OSIsoft, are real-time databases
(RTDB) that replaced strip chart recorders through
standard interface between RTDB and DCS. For finding
golden benchmarks, data acquirement details are shown as

below (Table 2). Multi-way data about total 26 batches and z Every batch reaction beginning time is the real start
3 tags in every batch are exported form PI Systems, the time for process analysis. We chose the first time of
sampling interval for every tag is all 1 minute. PIC0605 changing from minus to positive number as
Table 2. Data acquirement
every batch start time in this process.
z Considering end time synchronization, we chose the
Start End
Batch ID Tag minimal column of these 26 batches as data sample’s
Time Time
column (n=820) because it's the sign of the best
duration time in these batches. More columns of other
batches should be excluded from data set.
measured value)
(Temperature hh:mm:ss hh:mm:ss
(N=1 to 26)
controller output)
(Pressure measured
Why we need data pre-treatment before using MPCA or
MPLS for analyzing batch process? As shown in Fig. 3, Fig. 4. Batch trajectory synchronization method for MPCA
loading plot resulted by using MPCA directly for raw data,
which included 26 batches data without pre-processing, For example, individual value plot of TICA0610.PV can
can’t tell us any valuable information about this process. It be illustrated after batch trajectory synchronization (Fig. 5).
is assumed that the batch durations are the same when most z Through synchronization, more process details can be
multivariate statistical methods being used. But in real observed. For example, normal batches and abnormal
industrial production line, the batch durations are not fixed batches can be divided clearly. At the same time, we
because of changes and disturbances to operating can check quality record for proving authenticity of
conditions. For example, distribution of these 26 batch these abnormal batches judged by data analysis. These
durations is from 16.4h to 22.2h (Fig. 3). In such situations, abnormal batches could also be excluded from the
data pre-treatment, which is also called synchronization, is next data pro-processing step.
z For normal batches, the top values of TICA0610.PV
in every batch are different. In the next step, combing
these batches quality indexes with operating data,
deviations of the process control resulting in different
product can be analysed through MPLS.

Fig. 3. PCA result without data pre-processing

3.1 Batch Trajectory Synchronization for MPCA

From engineering point of view, the batch trajectory
synchronization should be linked experiments and process Fig. 5. Batch trajectory synchronization results
characteristics to models (Fig. 4):
points in data pre-processing for MPLS when quality
indexes combing with process variables (Fig. 6).
3.2 Data Pre-processing for MPLS
z Sampling frequency is different between process
After batch trajectory synchronization been finished, 22
variables and quality variables. Every batch product
normal batches will be included in the data set for analysing.
is only done one final quality test in this site, results of
MPLS has been shown to be effective in improving batch
every quality variable can be obtained through test.
quality of an industrial batch process. There are two key

z For analysing every process variable’s impact on final
product quality, there is a one-to-one
correspondence between batch process variables and
this batch quality indexes. Every quality index’s value,
corresponded to every process variable in one batch,
is the same. For example, we choose “D-Value” (the
most important quality index for this product) as “Y”,
and three process variables in Batch 1 are
corresponded to one “D-Value” (equal to 1.074) of
Batch 1, “D-Value” corresponded to batch 22 three
process variables is 1.0692.

Fig. 7. Results of MPCA for TICA0610.PV

Considering to display temperature change differences

between different batches, individual value plot of
TICA0610.PV from time 20 to 60, including key time range
from 30 to 50, is shown in Fig. 8. It is clearly that
temperature values of different batches in the same time
Fig. 6. Data pre-processing method for MPLS point are indicated more differences with the elapse of time.


It is clear that researchers had presented or surveyed
standard mathematical models of MPCA and MPLS which
are effective approaches for batch analysis (Chiang, 2006;
Yao, 2009). The purpose of this paper is to illustrate how to
use these mathematical models in engineering for solving
an industrial batch process problem.
4.1 Software
The batch unfolding multi-way analysis, MPCA and
MPLS, are completed using Minitab TM version 16
Fig. 8. Temperature changes from time 20 to time 60
4.2 Results of MPCA
The MPCA algorithm reduces the number of variables 4.3 Finding Golden Batch Benchmarks through MPLS
by building linear combinations of them, so the co-linearity Through MPCA, key time range had been detected from
can be handled and the dimensionality of the input space “rich data”. Golden batch benchmarks can be identified
can be decreased at the same time. from this time range effectively through MPLS because
For these total 22 batch process data, from time 1 to time quality indexes had been combed with process variables in
820, the TICA0610.PV loadings plot of MPCA result is the data pre-processing.
shown in Fig. 7, it is indicated that the samples underwent Standardization of the coefficient is usually done to
temperature changes related to different batch groups. This answer the question of which of the independent variables
is usefully for off-line analysis because the loading have a greater effect on the dependent variable in a multiple
represents to which time period decided overall regression analysis. In this case, the independent variables
batch-to-batch variation in this case. According to results, are refer to every batch’s temperature values in the key time
temperature in time range from n=30 to 50 is the most range and the dependent variable is refer to “D” value.
important for this batch process, as shown in Fig. 7 red Standard regression coefficient plot is shown in in Fig. 9,
circle. So from loading plot, key time range for deciding No.14 dependent variable (temperature in time 34) has the
batch-to-batch variation can be chosen from “rich data” most effective on the final produce quality, the higher
(time 1 to time 820). temperature in this time point means to the greater “D”
Note that the loading calculation can only be performed value in the final product because of positive regression
when the process variables for the entire batch are coefficient.
measured, sample data set had been conformed to this
requirement through batch trajectory synchronization for

Heating TICA0610 144&PIC. %
PV”190 Automatic
Reacting PIC0605 PV = SV Automatic

Fig. 9. Standard regression coefficient plot of MPLS

Based on above mentioned, comparing temperatures in Fig. 12. Comparing reaction manual and automatic control
time 34 of the minim and maximum “D” values in this data
set, as shown in Fig. 10, only little temperature difference VI. “D” VALUE COMPARISON BEFORE-AND-AFTER
can result in the “best” quality (D=1.0693) and the “worst” OPTIMIZATION
quality (D=1.087) product in these samples.
Other 20 batches are chose for comparing, 10 batches are
Analysing operation conditions about D=1.0693 in detail, without optimization and 10 batches are triggered reaction
golden batch benchmarks can be found, as shown in Fig. 11. automatically according to golden batch benchmarks.
Operator started heating stage when TICA0610.PV is more “D” values distributions of these batches with and without
than 144 ć and PIC0605.PV is little than 144 kPa. optimization are shown in Fig. 13. The average of “D”
value is decreased from 1.0805 to 1.073 after optimizing, it
means that final batch quality is improved highly through
mining multi-way data, finding golden batch benchmarks
and optimizing process control.

Fig. 10. Comparison operation values based on “D” values

Fig. 13. “D” value comparison


In section 6, we discussed production quality
Fig. 11. Finding golden batch benchmarks
improvement before-and-after optimization only on 10
batches; in this section, we will compare one year data from
V. BATCH PROCESS CONTROL OPTIMIZATION 2013 (before) to 2014 (after) for researching continuous
After we found the best batch reaction initiation status improvement. For example, one site of China National
decision, batch process control optimization should be Chemical Corporation had adopted to above mentioned
configured in DCS (Table 2). In heating stage, a certain data analysis and automatic control solution in its
switching surface of bang-bang control [23] had been chloroprene rubber equipment, production cycle time of
confirmed, on-off controller is responsible for reaction chloroprene rubber had been decreased about 28.5%
automatically (Fig. 12). compared 2013 with 2014, every month details are shown
Table 2. An ordered set of process actions after optimizing in Fig. 14. It is means that this equipment production
capacity is enhanced 28.5% in 2014 without any
Command Controller Condition Mode investment, product value because of production
SUM > capacity enhancement is about one million dollars.
Charging FIC0607 Automatic

Except for production cycle time had been reduced and [3] E. Keogh and S. Kasetty, “On the need for time series data mining
benchmarks: a survey and empirical demonstration,” in Proc. 8th
stabilized, first pass yield (FPY) had also improved
ACM SIGKDD International Conference on Knowledge Discovery
continuously from September last year to May this year, as and Data Mining, Canada, 2002, pp. 102–110.
shown in Fig. 15. [4] I. Jolliffe, Principal component analysis (second ed.), Springer,
2002, pp. 130–135.
[5] J. Jackson, A user’s guide to principal components, New York:
Wiley-Interscience, 1991, pp. 123–129.
[6] P. Geladi and B. R. Kowalski, “Partial least-squares regression: A
tutorial,” Analytica Chimica Acta, Vol.185, pp.1–17, 1986.
[7] A. Hoskuldsson, “PLS regression methods,” Journal of
Chemometrics, Vol.2, No.2, pp.211–228, 1988.
[8] Y. L. Su, F. Q. Yu, J. Zhou and Q. X. Zhang, “Product Moisture
Real-time Monitoring Based on Soft-sensing Technique,”in Proc.
the IEEE International Conference on Information and Automation,
2014, pp. 604–609.
[9] P. Kadlec, B. Gabrys, and S. Strandt, “Data-driven Soft Sensors in
the Process Industry,” Computers & Chemical Engineering, vol. 33,
no. 4, pp. 795–814, 2009.
[10] V. Venkatasubramanian, R. Raghunathan and N. K. Surya, “A
Review of Process Fault Detection and Diagnosis,” Computers &
Fig. 14. Production cycle time comparison Chemical Engineering, vol. 27, no. 3, pp. 293–326, 2003.
[11] B. M. Wise, N. B. Gallagher, S. Butler, D. White and G. Barna, “A
comparison of principal component analysis, trilinear decomposition
and parallel factor analysis for fault detection in a semiconductor
etch process,” Journal of Chemometrics, vol. 13, no. 3, pp. 379–396,
[12] G. Cherry and S. J. Qin, “Multiblock principal component analysis
based on a combined index for semiconductor fault detection and
diagnosis,” IEEE Transactions on Semiconductor Manufacturing,
vol. 19, no. 2, pp. 159–172, 2006.
[13] P. Nomikos and J. F. MacGregor, “Monitoring of batch process using
multi-way principal component analysis,” AIChE Journal, vol. 40,
no. 8, pp. 1361–1375, 1994.
[14] S. Wold, N. Kettaneh, H. Friden and A. Holmberg, “Modelling and
Fig. 15. Continuous improvement of FPY Diagnostics of a Batch Process and Analogous Kinetic
Experiments,” chemometrics and intelligent laboratory systems, vol.
44, no. 8, pp. 331–340, 1998.
VIII. CONCLUSIONS [15] Y. Yao and F. R. Gao, “Survey on Multistage/multiphase Statistical
Modeling Methods for Batch Processes,” Annual Reviews in
This paper applies MPCA and MPLS methodology to Control, vol. 33, no. 2, pp. 172–183, 2009.
optimize a batch process. Process engineers and control [16] I. E. Grossmann, “Advances in Mathematical Programming Models
engineers can obtain the valuable data analysing and for Enterprise-wide Optimization,” Computers & Chemical
control optimization information for batch process from Engineering, vol. 47, no. 2, pp. 2–8, 2012.
[17] P. Nomikos and J. F. MacGregor, “Multivariate SPC charts for
this study. monitoring batch processes,” Technometrics, vol. 37, no. 1, pp.
(1) How to optimize batch process: combing golden 41–59, 1995.
[18] V. Venkatasubramanian, “DROWNING IN DATA: Informatics and
batch benchmarks with automatic control strategies in DCS Modeling Challenges in a Data-rich Networked World,” AIChE
can reduce production batch time and improve quality. Journal, vol. 55, no. 1, pp. 2–8, 2009.
[19] I. E. Grossmann and A. W. Westerberg, “Research challenges in
(2) Economic benefits: decreasing of batch time means process systems engineering,” AIChE Journal, vol. 46, no. 9, pp.
that equipment production capacity is enlarged without new 1700–1703, 2000.
investment; and quality improvement means that product [20] K. U. Klatt, and M. Wolfgang, “Perspectives for Process Systems
Engineering—Personal Views from Academia and Industry,”
can be sold higher price and with high customer confidence
Computers & Chemical Engineering, vol. 33, no. 3, pp. 536–550,
level. 2009.
[21] T. C. Fu, “A Review on Time Series Data Mining. Engineering,”
(3) Values of “big data”: if company management asks Applications of Artificial Intelligence, vol. 24, no. 1, pp. 64–81,
the potential benefit of a real-time databases (such as PI 2011.
System) or “big data”, this paper can give a solution [22] S. J. Qin, “Process Data Analytics in the Era of Big Data,” AIChE
perfectly because of data analysis combing with Journal, vol. 60, no. 9, pp. 3092–3100, 2014.
[23] N. Blakemore and R. Aris, “Studies in optimization-V.: The
engineering practices integrally. bang-bang control of a batch reactor,” Chemical Engineering
Science, 17(8), vol. 17, no. 8, pp. 591–598, 1962
[1] Instrument Society of America, ISA-88.01-1995 (R2006) Batch
Control, Part 1: Models and Terminology, chapter 4 Batch
processes and equipment. North Carolina: ISA, 1995, pp. 123–135.
[2] L. H. Chiang, L. Riccardo, J. P. Randy and B. S. Mary, “Industrial
Experiences with Multivariate Statistical Analysis of Batch Process
Data,” Chemometrics and Intelligent Laboratory Systems, vol. 81,
no. 2, pp. 109–119, 2006.


You might also like