Professional Documents
Culture Documents
AIChE Journal - 2022 - Sharma - A Hybrid Science Guided Machine Learning Approach For Modeling Chemical Processes A Review PDF
AIChE Journal - 2022 - Sharma - A Hybrid Science Guided Machine Learning Approach For Modeling Chemical Processes A Review PDF
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 10 August 2021 Revised: 26 December 2021 Accepted: 15 January 2022
DOI: 10.1002/aic.17609
REVIEW ARTICLE
Process Systems Engineering
KEYWORDS
chemical process modeling, data-based model, first-principles model, hybrid modeling,
science-guided machine learning
modeling with ML methodology according to the modeling objectives. applications is the direct hybrid modeling involving the integration of
The latter include, for example, improving the predictions beyond first-principles model with data-based neural networks.9 Psichogios
physical models, downscaling the complexity of physics-based models, and Unger10 combine a first-principles model based on prior process
generating data, quantifying uncertainty, and discovering governing knowledge with a neural network, which serves as an estimator of
equations of the data-based model. unmeasured process parameters that are difficult to model from first
The objective of this article is to present a review and exposition principle. They apply the hybrid model to a fed-batch bioreactor, and
of scientific and engineering literature relating to the hybrid SGML the integrated model has better properties than the standard “black-
approach, and propose a systematic classification of hybrid SGML box” neural network models. In particular, the integrated model is able
models focusing on both science complementing ML models, and ML to interpolate and extrapolate much more accurately, is easier to ana-
complementing science-based models. Section 2 gives a review of the lyze and interpret, and requires significantly fewer training examples.
broad applications of hybrid SGML approach in bioprocessing and Thompson and Kramer11 later demonstrate how to integrate simple
chemical engineering. As the number of reported methodologies and process model and first-principles equations to improve the neural
applications continues to rise significantly, it is hard for a person unfa- network predictions of cell biomass and secondary metabolite in a
miliar with the subject to identify the appropriate approach for a spe- fed-batch penicillin fermentation reactor when trained on sparse and
cific application. This leads to our key focus in Sections 3–5, noisy process data.
beginning with a systematic classification and exposition of hybrid Agarwal12 develops a general qualitative framework for identify-
SGML methodologies in Section 3. Section 4 explains different cate- ing the possible ways of combining neural networks with the prior
gories of applying ML to complement science-based models, discuss knowledge and experience embedded in the available first-principles
their requirements, strengths, and limitations, suggest potential areas models, and discusses the direct hybrid modeling with series or paral-
of applications, and present illustrative examples from chemical lel configuration to combine the outputs of the science-based model
manufacturing. Section 5 focuses on different categories of applying and the ML model. Asprion et al.13 present the term, gray-box model-
scientific principles to complement ML models, together with their ing, for optimization of chemical processes. They consider the case
requirements, strengths, and limitations, as well as their potential where a predictive model is missing for a process unit within a larger
applications and illustrative examples. Section 6 describes the chal- process flow sheet, and use measured operating data to set up hybrid
lenges and opportunities for hybrid SGML approach for modeling models combining physical knowledge and process data. They report
chemical processes. Section 7 summarizes our conclusions. results of optimization using different gray-box models for process
This work differentiates itself from several recent reviews of simulators applied to a cumene process. Actually, in a number of ear-
hybrid modeling in bioprocessing and chemical engineering through lier studies, Bohlin and his coworkers have explored in details the con-
the following contributions: (1) presentation of a broader hybrid cepts of gray-box identification for process control and optimization,
SGML methodology of integrating science-guided and data-based and Bohlin has summarized the concepts, tools, and applications of
models, and not just the direct combinations of first-principles and gray-box hybrid modeling in an excellent book.14
ML models; (2) classification of the hybrid model applications Over the years, we have seen a growing number of applications
according to their methodology and objectives, instead of their areas of hybrid modeling in bioprocessing and chemical engineering as part
of applications; (3) identification of the themes and methodologies of the advances in smart manufacturing.15–17
which have not been explored much in bioprocessing and chemical In their 2021 paper, Sansana et al.16 discuss mechanistic model-
engineering applications, like the use of scientific knowledge to help ing, data-based modeling, hybrid modeling structures, system identifi-
improve the ML model architecture and learning process for more sci- cation methodologies, and applications. They classify their hybrid
entifically consistent solutions; and (4) illustrations of the use of these model into parallel, series, surrogate models (which are simpler mathe-
hybrid SGML methodologies applied to industrial polymer processes, matical representations of more complex models and similar to
such as inverse modeling and science-guided loss which have not reduced-order models that we discuss below), and alternate structures
been applied previously in such applications. (which include gray-box modeling mentioned above). In the alternate
structures, they refer to some applications of semi-mechanistic model
structures where the best hybrid model is selected using optimization
2 | A P P L I C A T I O N S O F H Y B R I D S GM L concepts. They also classify the hybrid models based on some of the
APPROACH IN BIOP RO CESSI NG AND chemical industry applications into analysis of model-plant
C HE M I C A L E N G I N E E R I N G mismatch,17 model transfer, feasibility analysis and predictive mainte-
nance, apart from the previous mentioned applications like process
The integration of science-based models with data-based models has control, monitoring and optimization.
appeared in various fields like fluid mechanics,3 turbulence modeling,4 Von Stosch et al.18 have used the term, hybrid semi-parametric
quantum physics,5 climate science,6 geology,7 and biological modeling, in their 2014 review, and have summarized applications in
8
sciences. bioprocessing for monitoring, control, optimization, scale-up and
This study focuses on applications of hybrid SGML methodologies model reduction. They emphasize that the application of hybrid semi-
in bioprocessing and chemical engineering. Among the earliest parametric techniques does not automatically lead to better results,
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SHARMA AND LIU 3 of 19
but that rational knowledge integration has potential to significantly A recent patent by Chan et al.66 presents Aspen Technology's
improve model-based process design and operation. approach on asset optimization using integrated modeling, optimiza-
19
Qin and Chiang review the advances in statistical machine lean- tion and artificial intelligence. In a later white paper, Beck and
ing and process data analytics that can provide efficient tools in devel- Munoz67 describe Aspen Technology's current focus on hybrid model-
20
oping future hybrid models. In a latest paper, Qin et al. propose a ing, combining AI and domain expertise to optimize assts. In particular,
statistical learning procedure integrating with process knowledge to based on their application experience in chemical industries, Aspen
handle a challenging problem of developing a predictive model for Tech have classified hybrid models into three categories: AI-driven,
process impurity levels from more than 40 process variables in an first-principles driven and reduced-order models.67 They define an
industrial distillation system. Both studies highlight the power of sta- AI-driven hybrid model as an empirical model based on plant or exper-
tistical machine leaning for developing future hybrid process models. imental data and use first principles, constraints and domain knowl-
A survey of the literature has shown applications of hybrid model- edge to create a more accurate model. Examples of AI-driven models
ing in bioprocesses,21–27 chemical and oil and gas process are inferential sensors or online equipment models. They define a
industries,28–32 and polymer processes33,34 for more accurate and sci- first-principles driven hybrid model as an existing first-principles
entifically consistent predictions. This survey has also shown many model augmented with data and AI to improve model's accuracy and
topical focuses of applications in bioprocessing and chemical engi- predictability, which has seen many applications in bioprocessing and
neering, including process control,35–38 design of experiments,39,40 chemical engineering. Lastly, they define a reduced-order model
process development and scale-up,41,42 process design,43 and where we use ML to create an empirical data-based model based on
optimization.13,44,45 data from first-principles process simulation runs, augmented with
46
In a recent study, Zhou et al. present a hybrid approach for inte- constraints and domain expertise, in order to build a fit-for-purpose
grating material and process design that holds much promise in pro- low-dimensional model that can run more quickly. With reduced-
cess and product design. Cardillo et al.47 demonstrate the importance order models, we can extend the scale of modeling from units to the
of hybrid models in silico production of vaccines to accelerate the plant-wide models that can be deployed faster.
23
manufacturing process. Chopda et al. apply integrated process ana-
lytical techniques, and modeling and control strategies to enable the
continuous manufacturing of monoclonal antibodies. McBride et al.48 3 | A CLASSIFICATION AND EXPOSITION
classify the hybrid modeling applications in different separation pro- O F H Y B R I D S CI E N C E - G U I D E D M A C H I N E
cesses in chemical industry, namely, distillation,49–51 LE A R N I N G M O DE L S
52,53 54–56 57,58
crystallization, extraction, floatation, filtration,59 and
60,61 62
drying. Venkatasubramanian gives an excellent exposition of As we have seen thus far, the majority of work in hybrid model appli-
the current state of development and applications of artificial intelli- cations in bioprocessing and chemical engineering focuses on the
gence in chemical engineering. The author highlights the intellectual direct combination of science-based and data-based models. In this
challenges and rewards for developing the conceptual frameworks for article, we portray a broad perspective of the combination of scientific
hybrid models, mechanism-based causal explanations, domain-specific knowledge and data analysis in bioprocessing and chemical engineer-
knowledge discovery engines, and analytical theories of emergence, ing as inspired by some of the applications in physics and other
and presents examples from optimizing material design and process areas.1,2 We categorize these hybrid SGML applications in chemical
operations. process industry into two major categories, namely, ML compliments
In an excellent edited volume, Glassey and Stosch63 discuss some science and science compliments ML, together with their subcate-
of the key strengths of hybrid modeling in chemical processes, partic- gories based on the methodologies and objectives of hybrid modeling
ularly in the prediction of scientifically consistent results beyond the as illustrated in Figure 1. We also classify the applications in bio-
experimentally tested process conditions, which is crucial for process processing and chemical engineering according to our hybrid SGML
development, scale-up, control and optimization. They also identify approach. We present examples in several areas of SGML which have
some challenges. For example, incorrect fundamental knowledge in a not been explored much thus far, and which have great potential for
science-based model could impose bias on predictions, thus the process improvement and optimization.
underlying assumptions used in a model are important for analysis.
Also, time and accuracy of parameter estimation is critical when
deciding on a hybrid modeling strategy. Kahrs and Marquardt64 dis- 4 | ML COMPLEMENTS SCIENCE
cuss the approach of simplifying the complex hybrid models into
sequence of simpler problems, such as data preprocessing, solving We can integrate a first-principles scientific model with a data-based
nonlinear equations, parameter estimation and building empirical model to improve the model accuracy and consistency. In the follow-
models using ML. ing, we introduce the subcategories of direct hybrid modeling, inverse
Herwing and Portner in their latest book showcase the applica- modeling approach, reducing model complexity, quantifying uncer-
tions of hybrid modeling in digital twins for smart biomanufacturing.65 tainty in the process, and discovering governing equations.
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 19 SHARMA AND LIU
F I G U R E 1 Classification of
hybrid SGML models
combining first-principles mass and energy balance equations for a modeling strategy of Figure 3 for prediction. This series–parallel com-
series of five fermenters with a data-based, functional link network75 bination or feedback system can improve model predictions
to identify the kinetic parameters of the fermentation reactors trained depending on the application.
by plant data. The hybrid model includes the effect of temperature on Bhutani et al.80 present a definitive study comparing first-princi-
the fermentation kinetics and show good nonlinear approximation ples, data-based and hybrid models applied to an industrial hydro-
capability. Sharma and Liu78 show how to use plant data to estimate cracking process. In particular, they couple a first-principles
kinetic parameters of first-principles models for industrial polyolefin hydrocracking model based on pseudocomponents with data-based
processes. In a recent study Bangi and Kwong79 estimate process neural network models of different configurations of Figures 3–5 that
parameters in hydraulic fracturing process using deep neural network quantify the variations in operating conditions, feed quality and cata-
which are then input to a first-principles model. Finally, we note that lyst deactivation. The neural network component of the hybrid model
as illustrated in Figure 4, we can interchangeably use a science-based either provides updated model parameters in the first-principles pro-
model or a ML model first in the hybrid framework, depending on if cess model connected in series, or correct predictions of the first-
we require to add more features to augment the data set or to esti- principles process models. The hybrid models are able to represent
mate model parameters. the behavior of an industrial hydrocracking unit to provide accurate
and consistent predictions in the presence of process variations and
changing operating scenarios.
4.1.3 | Series–parallel or combined direct hybrid Song et al.81 also apply the direct hybrid model configurations of
model Figures 3–5 to an industrial hydrocracking process and analyze the
strengths and weaknesses of these configurations. They call a model a
Figure 5 shows a combined direct hybrid model, where we use the mechanism-dominated model if the accuracy of its outputs is mainly
steady-state data from the plant to estimate the unknown parameters dominated by the available theoretical knowledge used to develop the
of a science-based process model and then uses the hybrid residual model; and they also call a model a data-dominated model if the
targets, we can predict the operating conditions required to produce unavailable, and some online measurements such as the temperature
that polymer grade using the inverse modeling approach. profile down the reactor and the solvent flow rate are available on a
frequent basis. The dimensionality reduction aspects of PLS facilitates
the development of a multivariate statistical control plot for monitor-
4.4 | Reduced-order models ing the operating performance of the reactors.
Model reduction can be achieved through dimensional reduction
Reduced-order models (ROMs) are simplified models that represent a methods like principal component analysis. Another approach is to apply
complex process in a computationally inexpensive manner, but also the residual combination with ML model for a ROM model, or to build a
maintain high degree of accuracy of prediction in simulating the pro- ML-based surrogate model for the full-order model. Reduced-order
cess. In bioprocessing and chemical engineering, we can apply the models have been called surrogate models in the context of gray-box
ROM methodology to simulate complex processes and then use ML modeling techniques where first-principles models are combined with
models to optimize the processes. See Figure 9. We can use ROMs to data-based optimization techniques. Rogers and Lerapetritou91,92 propose
simulate different scenarios and sensitivities in order to generate pro- the use of surrogate models as reduced-order models that approximate
cess data, which in turn can be combined with ML models to build the feasibility function for a process in order to evaluate the flexibility and
accurate soft sensors to predict quality variables. This approach helps operability of a science-based process model, since it is difficult to directly
to make sure that the ML model is trained on process data with multi- evaluate the feasibility due to black-box constraints.
ple variations which is not possible in a steady plant run. Hence, data- In a recent study, Abdullah et al.93 showcase a data-based reduced-
based sensors will be accurate for any future process optimization, order modeling of nonlinear processes that have time-scale multiplicity to
scale up etc. and it is also easier to deploy such models online. identify the slow process state variables that can be used in a dynamic
In one of the earliest applications of ROM, MacGregor et al.90 model. Agarwal et al.94 use ROM for modeling pressure swing adsorption
apply a PLS (projection to latent squares or partial least squares) ML process where they use a low-dimensional approximation of a dynamic
model of a polyethylene using process data simulated from a process partial differential equation model, which is more computationally effi-
model to develop inferential prediction models for polymer properties. cient. In another study, Kumar et al.45 use a reduced-order steam meth-
This application involves a high-pressure tabular reactor system pro- ane reformer model to optimize furnace temperature distribution. In a
ducing low-density polyethylene, in which all the fundamental poly- recent study, Shafer et al.95 use a reduced-dimensional dynamic model
mer properties are extremely difficult to measure and are usually for the optimal control of air separation unit. The model combines com-
partmentalization to reduce the number of differential equations with
artificial neural networks to quantify the nonlinear input–output relations
within compartments. This wok reduces the size of the differential equa-
tion system by 90%, while limiting the additional error in product purities
to below 1 ppm compared to a full-order stage-by-stage model.
Kumari et al.96 use data-based reduced order methods for compu-
tational fluid dynamic modeling applied to a case study of super criti-
cal carbon dioxide rare event. They propose a k-nearest neighbor
(kNN)-based parametric reduced-order model (PROM) for conse-
quence estimation of rare events to enhance numerical robustness
with respect to parameter change. Recently, many operator-theoretic
modeling identification and model reduction approaches like the
Koopman operators have been applied to integrate first-principles
knowledge into finding relationship among multiple process variables
F I G U R E 8 Hydrogen feed predictions from inverse modeling of in chemical processes. Koopman operator offers great utility in data-
product quality features driven analysis and control of nonlinear and high-dimensional systems.
We quantify the uncertainty of the chemical process model in 4.8 | Hybrid SGML modeling to aid in discovering
predicting the melt index for the industrial HDPE process scientific laws using ML
described in Section 4.1.4. This uncertainty in prediction may
result from the estimated kinetic parameters of the process, which One way in which ML can help science-based modeling is by discover-
propagates to the quality output as well. We simulate the data ing new scientific laws which governs the system. There is a growing
using the chemical process model and calculate the prediction application of ML in physics to rediscover or discover physical laws
89
intervals using a gradient boosting ML model. In this case, we mainly by data-driven discovery of partial differential equations. ML
use the concept of prediction intervals to determine the range of can be used to develop an empirical correlation which can be used as
model prediction. We use the quantile regression loss with gradi- a scientific law in a science-based model, or ML can be used to solve
ent boosting model to predict the prediction intervals.105 We the partial differential equation defining scientific laws as illustrated in
define a lower and an upper quantile according to the desired pre- Figure 13.
diction interval. Figure 12 illustrates the uncertainty in the predic- Rudy et al.106 showcase the discovery of physical laws like the
tion of melt index given by the range of the 90% prediction Navier–Stokes equation and the reaction–diffusion equation in chem-
interval which implies that there is 90% likelihood that the ML ical processes by a sparse regression method governing the PDE by
model prediction will lie in the given range. The resulting RMSE using a system of time series measurements. Langley et al.107 present
value lies within 1.2–1.5, with the standard deviation of melt index the applications of ML in rediscovering some of the chemistry laws,
data equals 5.1. In the figure, we see that the prediction interval is such as the law of definite proportions, law of combining volumes,
the area between the two black lines represented by the upper determination of atomic weights and many others.
quantile (95th percentile) and the lower quantile (5th percentile). Another important application of ML is to discover some of the
From the figure, we see a larger prediction interval that means a thermodynamic laws which can be useful in defining the phase equi-
higher uncertainty in prediction for time less than 100 h compared librium and critical for an accurate science-based process model.
to the later stage because of a more appreciable change in MI in Nentwich et al.108 use data-based mixed adaptive sampling strategy
that interval. Thus, uncertainty quantification (UQ) helps in making to calculate the phase composition, instead of the complex equation-
better process decisions after knowing the error estimate of the of-state models. Thus, ML application can have promising use in dis-
model. covering more accurate physical and chemistry laws that govern the
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 19 SHARMA AND LIU
5 | S C I E N C E CO M P L I M E N T S M L
Referring to Figure 1, we can also improve ML models using scientific Fuzzy artificial neural networks (ANN) are a class of neural net-
knowledge. We can improve the generalization or extrapolation capa- works which utilize prior scientific knowledge of the system to formu-
bility and reduce the scientific inconsistency of ML models by using late rules mapped on to the structure of the ANN.9,115 The weights of
scientific knowledge in designing the ML models. The scientific knowl- the ANN connecting the process input to output can be connected to
edge can also help in improving the architecture of the data-based ML physical process variables.63 Apart from making the models more sci-
model or the learning process of the ML model and even with the final entifically consistent with prior knowledge, they also reduce computa-
post-processing of the ML model results. tional complexity and provides interpretable results. The use of prior
knowledge also makes them suitable for extrapolation. Fuzzy ANN
have been particularly useful for applications in process control.116
5.1 | Science-guided design Simutis et al. use fuzzy ANN system for industrial bioprocess monitor-
ing and control.117,118 They also illustrate the application of fuzzy
In science-guided design, we choose the model architecture based on ANN process control expert to perform appropriate control actions
scientific knowledge. For a neural network, we can decide the inter- based on process trends for bioprocess optimization and control.119
mediate variables expressed as hidden layers based on scientific Sparse identification of nonlinear dynamics (SINDy) is another
knowledge of the system. This helps in improving the interpretative data-based modeling method that utilizes scientific knowledge for
ability of the models. Figure 14 illustrates a neural network model improving the model performance with the algorithms.109 Bhadiraju
whose architecture like the number of neurons, hidden layers, activa- et al.120 have used the SINDy algorithm to identify the nonlinear
tion layers etc. can be decided by prior scientific knowledge. dynamics of a chemical process system (CSTR). They used sparse
39
In a bioprocess application, Rodriguez-Granrose et al. use the regression in combination with feature selection to identify accurate
design of experiments (DOE) to create and evaluate a neural network models in an adaptive model identification methodology which
architecture. They use DOE to evaluate activation functions and neu- requires much less than data that current methods. In a similar study
rons on each layer to optimize the neural network. In their recent Bhadiraju et al.121 have developed a modified SINDy approach that is
110
study, Wang et al. design their theory-infused neural networks helpful in cases of plant model mismatch and does not require
based on adsorption energy principles for interpretable reactivity pre- retraining and hence computationally less expensive.
diction. The use of the novel neural differential equation111 to solve a
first-principles dynamic system represents a hybrid SGML approach,
where the architecture of ML model is influenced by the system and 5.2 | Science-guided learning
finds applications in continuous time series models and scalable nor-
malizing flows. The derivative of hidden state is parameterized using a Here, we make use of the scientific principles to improve the scientific
neural network and the output of the network is computed using a consistency of data-based models by modifying the machine learning
112
differential equation solver. In a recent study, Jaegher et al. use the process. We do this by modifying the loss function, constraints and
neural differential equation to predict the dynamic behavior of even the initialization of ML models based on scientific laws. Specifi-
electro-dialysis fouling under varying process conditions. In a recent cally, in order to make the ML models physically consistent, we make
application of this theme in chemical process for model predictive the loss function of neural network model incorporate physical con-
113
control, Wu et al. use prior process knowledge to design the recur- straints.2 A loss function in ML measures how far an estimated value
9
rent neural network (RNN) structure. They showcase a methodology is from its true value. A loss function maps decisions to their associ-
to design the RNN structure using prior scientific knowledge of the ated costs. Loss functions are not fixed. They change depending on
system and also employ weight constraints in the optimization prob- the task in hand and the goal to be met. We can define a loss function
114
lem of the RNN training process. Reis et al. discuss the concept of (based on the mean squared error, MSE) of the ML model (LossM) for
incorporation of process-specific structure to improve process fault regression to calculate the difference between the true value (Ytrue)
detection and diagnosis. and the model predicted value (Ypred). Likewise, we can define a loss
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SHARMA AND LIU 13 of 19
function for a science-based model (LossSC), which is a function of the training and also prevents from reaching a local minimum, which is the
model predicted value (Y_pred) consistent with science-based loss. concept of transfer learning. Thus, we can use the data from a
We include a weighting factor λ to express the relative importance of science-based model to pretrain a ML model based on this concept of
both loss terms. We write the overall loss function (Loss) as: initialization.1,2,7 This concept has been utilized in chemical process
model in the form of process similarity and developing new process
Loss ¼ LossM Y true Y pred þ λLossSC Y pred : ð1Þ models through migration. In particular, Lu et al.122 introduce the con-
cept of process similarity, and classify it into attribute-based and
Figure 15 illustrates the concept of science-guided loss function. model-based similarities. They present a model migration strategy to
A science-guided initialization helps in deriving an initial choice of develop a new process model by taking advantage of an existing base
parameters before a model is trained so that it improves model model, and process attribute information. Adapting existing process
models can allow using fewer experiments for the development of a
new process model, resulting in a saving of time, cost, and effort. They
apply the concept to predict the melt-flow-length in injection molding
and obtain satisfactory results.
In another study on the similar concept, Yan et al.123 use a Bayes-
ian method for migrating a ML Gaussian process regression model.
They showcase an approach of an iterative model migration and pro-
cess optimization for an epoxy catalytic reaction process.
Recently, Kumar et al.124 try to optimize the non-Newtonian fluid
flow for industrial processes like crude oil transportation using a
physics-based loss function for the shear stress calculation for more
accurate flow predictions. In another study on the similar principle,
Pun et al.125 apply physics-informed neural networks for more accu-
rate and transferable atomistic modeling of materials.
F I G U R E 1 7 Science-guided
refinement framework
TABLE 1 Summary of hybrid SGML approach
Science-based
14 of 19
model/
Hybrid SGML modeling knowledge ML model Advantages Limitations Potential applications
ML compliments science (base model: science-based)
Direct hybrid Series Science-based Regression Extrapolation; Limited by data for parameter Kinetic estimation77,78
modeling model parameter estimation, interpolation, Soft sensor86
(SBM) estimation; data scientific knowledge Process optimization75
augmentation dominated Process design51
Process modeling29,76
Parallel SBM Regression Improved accuracy of Scientific consistency depends Process scale-up41
prediction, on SBM, extrapolation; data- Process optimization33,68
interpolation dominated Process control36,71,72
Soft sensor33
Process monitoring68
Predictive maintenance66,74
Series–parallel SBM Regression Higher accuracy, Increased model complexity, Process optimization80,81
interpolation data-dominated Process monitoring and control58
Plant-model mismatch17
Inverse modeling SBM Probabilistic, Computationally Lower generality of the model Product design and development85
regression cheaper inverse Polymer grade change
problem Material design87
Reduced-order models SBM Regression Fast online Higher bias, limited by SBM Process optimization at plant scale45,94
deployment, accuracy Dynamic modeling93,95
reduce model Soft sensor90
complexity Feasibility analysis91,92
Uncertainty quantification SBM Probabilistic Gives real error Limited by SBM assumptions, Process design and development99
estimate and parameters Reaction kinetics100
solution space Feasibility analysis104
Discovering scientific law SBM Regression, System stability Limited by data size/availability Physical chemistry107
probabilistic interpretability Fluid dynamics106
Thermodynamics
phase equilibrium108
Science compliments ML (base model: ML)
Science-guided design Laws or SBM Deep neural network Scientifically Requires deep scientific Dynamic system112
(DNN); neural consistent and knowledge of system Process control113,118
differential interpretable Process monitoring114
equation
Science-guided learning Laws DNN Scientifically Possible lower prediction Process design and development122,123
consistent and accuracy Process monitoring
interpretable Process flow124
Science-guided refinement SBM Probabilistic Less effort in feature Limited by SBM assumptions, Process design
selection parameters Discovery of materials125–128
SHARMA AND LIU
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SHARMA AND LIU 15 of 19
simultaneously predicting the polymer density correctly within the Data cleaning, preprocessing, feature engineering maybe difficult
physically consistent range of 0.94–0.97 g/c. By contrast, the density in certain cases but may be imperative in science-based model param-
estimates by the ML model alone result in density values greater than eter estimation hence in these cases the hybrid models may increase
1, which is physically inconsistent. the complexity compared to a standalone ML models like Neural Net-
work which may not require feature engineering. Model predictions
Loss ¼ LossM MItrue MIpred þ λLossSC ρðMItrue Þ ρ MIpred : ð2Þ must not only be accurate but also with lower uncertainty which may
be difficult for certain hybrid model methods.
There is a lot of scope of using hybrid SGML methodologies in
chemical process modeling, summarizing here some of the opportuni-
ties and areas where they can be beneficial. As we have seen Hybrid
5.4 | Science-guided refinement SGML models are useful for extrapolation and predicting beyond
operating range. Hence, it will be particularly useful for processes
By science-guided refinement, we mean the post-processing of ML development. Process fault diagnosis and anomaly detection is one
model results based on scientific principles. This post-processing of such area where data-based methods have been used extensively.
results of the ML model using science-based models can be useful to Thus, there is opportunity to combine scientific knowledge as well to
the design and prediction of material structure.116 Thus, the discovery make the anomaly detection process more scientifically consistent.
of materials forms the basis of chemical process development from Table 1 summarizes all the hybrid SGML models and their require-
which the manufacturing process of any compound can be designed. ments, advantages, limitations and potential applications.
This is different than the serial direct hybrid model discussed in
Section 4.1.2. In particular, we use the science-based model to merely
test the scientific consistency of the ML model results. Hautier et al.117 7 | CONC LU SION
use first-principles models based on density functional model to refine
the results of probabilistic ML models to discovery ternary oxides. We present a broad perspective of hybrid modeling with a science-
Figure 17 illustrates the science-guided refinement framework. guided machine learning (SGML) approach and its application in bio-
Another application for science-guided refining is for data genera- processing and chemical engineering. We give a detailed review and
tion. ML techniques like generalized adversarial networks (GAN) are exposition of the hybrid SGML modeling approach and its applica-
useful for generating data in an unsupervised learning. GANs do have tions, and classify the approach into two categories. The first refers to
a problem of high sample complexity2 which can be reduced by incor- the case where a data-based ML model compliments and makes the
porating some science-based constraints and prior knowledge. Cang first-principles science-based model more accurate in prediction, and
et al.118 apply ML models to predict the structure and properties of the second corresponds to the case where scientific knowledge helps
materials and use the results of the ab initio calculations to refine the make the ML model more scientifically consistent. We point out some
ML model results. They generate more imaging data for property pre- of the areas of SGML which have not been explored much in chemical
diction using a convolution neural network and introduce a morphol- process modeling and have potential for further use like in the areas
ogy constraint form scientific principles, while training of the where science can help improve the data-based model by improving
generative models so that it improves the prediction of the structure– the model design, learning and refinement. We also illustrate some of
property model. these applications of the hybrid SGML methodologies for industrial
Thus, some of these methodologies of having science polymer/chemical process improvement.
complimenting ML have much potential for future applications to bio- Thus, based on our review, we recommend that the use of hybrid
processing and chemical engineering. models will perform better than standalone ML for applications like
process development, since they are better at extrapolation while
standalone ML models which can be adequate for prediction in a
6 | C HA L L E N G E S A N D OP P O R T U N I T I ES steady running plant.
OF HY BRID SG ML AP PRO A CH FOR
MODELING CHEMICAL PROCESSES ACKNOWLEDG MENTS
We gratefully acknowledge Aspen Technology, Inc., for their support
Along with all the merits of using the SGML methodology, there are of the Center of Excellence in Process System Engineering in the
challenges as well. Incorrect fundamental knowledge and the assump- Department of Chemical Engineering at Virginia Tech since 2002. We
tions of the science-based first-principles model will lead to inaccurate thank Mr. Antonio Pietri, CEO, Mr. Willie K. Chan, Chief Technology
hybrid model, so it is important for the scientific model to be very Officer, Dr. Steven Qi, Vice President for Training and Customer Sup-
accurate. Lack of engineer/scientists having expertise in both domain port, and Mr. Daniel Clenzi, Director of University Programs of Aspen
knowledge and machine learning. Computation infeasibility in certain Technology, Inc. for their strong support. We would like to thank Pro-
modeling approaches like inverse modeling. fessor Anuj Karpatne for sharing his knowledge of theory-guided data
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 19 SHARMA AND LIU
science through his course, physics-guided machine learning, at 15. Yang S, Navarathna P, Ghosh S, Bequette BW. Hybrid modeling in
Virginia Tech. the era of smart manufacturing. Comput Chem Eng. 2020;140:
106-874.
16. Sansana J, Joswiak MN, Castillo I, et al. Recent trends on hybrid
AUTHOR CONTRIBUTIONS modeling for Industry 4.0. Comput Chem Eng. 2021;11:107-365.
Niket Sharma: Conceptualization (equal); data curation (equal); formal 17. Chen Y, Ierapetritou M. A framework of hybrid model development
analysis (equal); investigation (equal); methodology (equal); software with identification of plant-model mismatch. AIChE J. 2020;66(10):
e16996.
(equal); validation (equal); writing – original draft (equal); writing –
18. Von Stosch M, Oliveira R, Peres J, de Azevedo SF. Hybrid semi-
review and editing (equal). Y A Liu: Conceptualization (equal); data parametric modeling in process systems engineering: past, present
curation (equal); formal analysis (equal); funding acquisition (equal); and future. Comput Chem Eng. 2014;60:86-101.
investigation (equal); methodology (equal); project administration 19. Qin SJ, Chiang LH. Advances and opportunities in machine learning
for process data analytics. Comput Chem Eng. 2019;126:465-473.
(equal); resources (equal); software (equal); supervision (equal); valida-
20. Qin SJ, Guo S, Li Z, et al. Integration of process knowledge and sta-
tion (equal); visualization (equal); writing – original draft (equal); writ- tistical learning for the Dow data challenge problem. Comput Chem
ing – review and editing (equal). Eng. 2021;153:107451.
21. O'Brien CM, Zhang Q, Daoutidis P, Hu WS. A hybrid mechanistic-
empirical model for in silico mammalian cell bioprocess simulation.
DATA AVAI LAB ILITY S TATEMENT
Metab Eng. 2021;66:31-40.
Data derived from public domain resources.
22. Pinto J, de Azevedo CR, Oliveira R, von Stosch M. A bootstrap-
aggregated hybrid semi-parametric modeling framework for
ORCID bioprocess development. Bioprocess Biosyst Eng. 2019;42(11):1853-
Y. A. Liu https://orcid.org/0000-0002-8050-8343 1865.
23. Chopda V, Gyorgypal A, Yang O, et al. Recent advances in integrated
process analytical techniques, modeling, and control strategies to
RE FE R ENC E S enable continuous biomanufacturing of monoclonal antibodies.
1. Karpatne A, Atluri G, Faghmous JH, et al. Theory-guided data sci- J Chem Technol Biotechnol. 2021. doi:10.1002/jctb.6765
ence: a new paradigm for scientific discovery from data. IEEE Trans 24. Zhang D, Del Rio-Chanona EA, Petsagkourakis P, Wagner J.
Knowl Data Eng. 2017;29(10):2318-2331. Hybrid physics-based and data-driven modeling for bioprocess
2. Willard J, Jia X, Xu S, Steinbach M, Kumar V. Integrating physics- online simulation and optimization. Biotechnol Bioeng. 2019;
based modeling with machine learning: A survey. arXiv preprint 116(11):2919-2930.
arXiv:2003.04919; 2020. 25. Al-Yemni M, Yang RY. Hybrid neural-networks modeling of an enzy-
3. Muralidhar N, Bu J, Cao Z, et al. PhyNet: physics guided neural net- matic membrane reactor. J Chin Inst Eng. 2005;28(7):1061-1067.
works for particle drag force prediction in assembly. Proceedings of 26. Chabbi C, Taibi M, Khier B. Neural and hybrid neural modeling of a
the 2020 SIAM International Conference on Data Mining. Society for yeast fermentation process. Int J Comput Cogn. 2008;6(3):42-47.
Industrial and Applied Mathematics; 2020:559-567. 27. Corazza FC, Calsavara LP, Moraes FF, Zanin GM, Neitzel I. Determi-
4. Bode M, Gauding M, Lian Z, et al. Using physics-informed super- nation of inhibition in the enzymatic hydrolysis of cellobiose using
resolution generative adversarial networks for subgrid modeling in hybrid neural modeling. Braz J Chem Eng. 2005;22(1):19-29.
turbulent reactive flows. arXiv preprint arXiv:1911.11380; 2019. 28. Azarpour A, Borhani TN, Alwi SR, Manan ZA, Mutalib MI. A generic
5. Schütt KT, Kindermans PJ, Sauceda HE, Chmiela S, Tkatchenko A, hybrid model development for process analysis of industrial fixed-
Müller KR. Schnet: a continuous-filter convolutional neural network bed catalytic reactors. Chem Eng Res Des. 2017;117:149-167.
for modeling quantum interactions. arXiv preprint arXiv: 29. Luo N, Du W, Ye Z, Qian F. Development of a hybrid model for
1706.08566; 2017. industrial ethylene oxide reactor. Ind Eng Chem Res. 2012;51(19):
6. Faghmous JH, Kumar V. A big data guide to understanding climate 6926-6932.
change: the case for theory-guided data science. Big Data. 2014;2(3): 30. Zahedi G, Lohi A, Mahdi KA. Hybrid modeling of ethylene to ethyl-
155-163. ene oxide heterogeneous reactor. Fuel Process Technol. 2011;92(9):
7. Karpatne A, Watkins W, Read J, Kumar V. Physics-guided neural net- 1725-1732.
works (pgnn): an application in lake temperature modeling. arXiv pre- 31. Simon LL, Fischer U, Hungerbühler K. Modeling of a three-phase
print arXiv:1710.11431; 2017. industrial batch reactor using a hybrid first-principles neural-network
8. Lee D, Jayaraman A, Kwon JS. Development of a hybrid model for a model. Ind Eng Chem Res. 2006;45(21):7336-7343.
partially known intracellular signaling pathway through correction 32. Bellos GD, Kallinikos LE, Gounaris CE, Papayannakos NG. Modelling
term estimation and neural network modeling. PLoS Comput Biol. of the performance of industrial HDS reactors using a hybrid neural
2020;16(12):e1008472. network approach. Chem Eng Process Process Intensif. 2005;44(5):
9. Baughman DR, Liu YA. Neural Networks in Bioprocessing and Chemical 505-515.
Engineering. Academic Press; 1995. 33. Chang JS, Lu SC, Chiu YL. Dynamic modeling of batch polymeriza-
10. Psichogios DC, Ungar LH. A hybrid neural network-first principles tion reactors via the hybrid neural-network rate-function approach.
approach to process modeling. AIChE J. 1992;38(10):1499-1511. Chem Eng J. 2007;130(1):19-28.
11. Thompson ML, Kramer MA. Modeling chemical process using prior 34. Hinchliffe M, Montague G, Willis M, Burke A. Hybrid approach to
knowledge and neural networks. AIChE J. 1994;40(8):1328-1340. modeling an industrial polyethylene process. AIChE J. 2003;49(12):
12. Agarwal M. Combining neural and conventional paradigms for 3127-3137.
modelling, prediction and control. Int J Syst Sci. 1997;28(1):65-81. 35. Madar J, Abonyi J, Szeifert F. Feedback linearizing control using
13. Asprion N, Böttcher R, Pack R, et al. Gray-box modeling for the opti- hybrid neural networks identified by sensitivity approach. Eng Appl
mization of chemical processes. Chem Ing Tech. 2019;91(3):305-313. Artif Intel. 2005;18(3):343-351.
14. Bohlin TP. Practical Grey-Box Process Identification: Theory and Appli- 36. Simutis R, Lübbert A. Hybrid approach to state estimation for
cations. Springer Science & Business Media; 2006. bioprocess control. Bioengineering. 2017;4(1):21-29.
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SHARMA AND LIU 17 of 19
37. Cubillos F, Callejas H, Lima EL, Vega MP. Adaptive control using a 58. Hwang TM, Oh H, Choi YJ, Nam SH, Lee S, Choung YK. Develop-
hybrid-neural model: application to a polymerisation reactor. Braz J ment of a statistical and mathematical hybrid model to predict mem-
Chem Eng. 2001;18(1):113-120. brane fouling and performance. Desalination. 2009;247(1–3):
38. Doyle FJ III, Harrison CA, Crowley TJ. Hybrid model-based approach 210-221.
to batch-to-batch control of particle size distribution in emulsion 59. Piron E, Latrille E, Rene F. Application of artificial neural networks
polymerization. Comput Chem Eng. 2003;27(8–9):1153-1163. for crossflow microfiltration modelling: “black-box” and semi-
39. Rodriguez-Granrose D, Jones A, Loftus H, et al. Design of experi- physical approaches. Comput Chem Eng. 1997;21(9):1021-1030.
ment (DOE) applied to artificial neural network architecture enables 60. Zbicinski I, Strumiłło P, Kamin
ski W. Hybrid neural model of thermal
rapid bioprocess improvement. Bioprocess Biosyst Eng. 2021;44(6): drying in a fluidized bed. Comput Chem Eng. 1996;20:S695-S700.
1301-1308. 61. Cubillos FA, Alvarez PI, Pinto JC, Lima EL. Hybrid-neural modeling
40. Brendel M, Marquardt W. Experimental design for the identification for particulate solid drying processes. Powder Technol. 1996;87(2):
of hybrid reaction models from transient data. Chem Eng J. 2008; 153-160.
141(1–3):264-277. 62. Venkatasubramanian V. The promise of artificial intelligence in
41. Bollas GM, Papadokonstadakis S, Michalopoulos J, et al. Using chemical engineering: is it here, finally. AIChE J. 2019;65(2):466-478.
hybrid neural networks in scaling up an FCC model from a pilot plant 63. Glassey J, Von Stosch M, eds. Hybrid Modeling in Process Industries.
to an industrial unit. Chem Eng Process Process Intensif. 2003;42(8– CRC Press; 2018.
9):697-713. 64. Kahrs O, Marquardt W. Incremental identification of hybrid process
42. Von Stosch M, Hamelink JM, Oliveira R. Hybrid modeling as a models. Comput Chem Eng. 2008;32(4–5):694-705.
QbD/PAT tool in process development: an industrial E. coli case 65. Herwig C, Pörtner R, Möller J, eds. Digital Twins: Tools and Concepts
study. Bioprocess Biosyst Eng. 2016;39(5):773-784. for Smart Biomanufacturing. Springer; 2021.
43. Iwama R, Kaneko H. Design of ethylene oxide production process 66. Chan WK, Fischer B, Varvarezos D, Rao A, Zhao H, inventors; Aspen
based on adaptive design of experiments and Bayesian optimization. Technology Inc, assignee. Asset optimization using integrated
J Adv Manuf Process. 2021;3(3):e10085. modeling, optimization, and artificial intelligence. United States Pat-
44. Zhang S, Wang F, He D, Jia R. Batch-to-batch control of particle size ent Application. US 16/434,793. 2020.
distribution in cobalt oxalate synthesis process based on hybrid 67. Beck R, Munoz G. Hybrid Modeling: AI and Domain Expertise Com-
model. Powder Technol. 2012;224:253-259. bine to Optimize Assets; 2020. https://www.aspentech.com/en/
45. Kumar A, Baldea M, Edgar TF. Real-time optimization of an industrial tresources/white-papers/hybrid-modeling-ai-and-domain-expertise-
steam-methane reformer under distributed sensing. Control Eng combine-to-optimize-assets/?src=blog-global-wpt
Pract. 2016;54:140-153. 68. Galvanauskas V, Simutis R, Lübbert A. Hybrid process models for
46. Zhou T, Gani R, Sundmacher K. Hybrid data-driven and mechanistic process optimization, monitoring and control. Bioprocess Biosyst Eng.
modeling approaches for multiscale material and process design. 2004;26(6):393-400.
Engineering. 2021;7(9):1231-1238. 69. Tian Y, Zhang J, Morris J. Modeling and optimal control of a batch
47. Cardillo AG, Castellanos MM, Desailly B, et al. Towards in silico pro- polymerization reactor using a hybrid stacked recurrent neural net-
cess modeling for vaccines. Trends Biotechnol. 2021;39(11):1120- work model. Ind Eng Chem Res. 2001;40(21):4525-4535.
1130. 70. Su HT, McAvoy TJ, Werbos P. Long-term predictions of chemical
48. McBride K, Sanchez Medina EI, Sundmacher K. Hybrid semi- processes using recurrent neural networks: a parallel training
parametric modeling in separation processes: a review. Chem Ing approach. Ind Eng Chem Res. 1992;31(5):1338-1352.
Tech. 2020;92(7):842-855. 71. Hermanto MW, Braatz RD, Chiu MS. Integrated batch-to-batch and
49. Safavi AA, Nooraii A, Romagnoli JA. A hybrid model formulation for nonlinear model predictive control for polymorphic transformation
a distillation column and the on-line optimisation study. J Process in pharmaceutical crystallization. AIChE J. 2011;57(4):1008-1019.
Control. 1999;9(2):125-134. 72. Ghosh D, Hermonat E, Mhaskar P, Snowling S, Goel R. Hybrid
50. Mahalec V. Hybrid modeling of petrochemical processes. Hybrid modeling approach integrating first-principles models with subspace
Modeling in Process Industries. CRC Press; 2018:129-165. identification. Ind Eng Chem Res. 2019;58(30):13533-13543.
51. Mahalec V, Sanchez Y. Inferential monitoring and optimization of 73. Ghosh D, Moreira J, Mhaskar P. Model predictive control embedding
crude separation units via hybrid models. Comput Chem Eng. 2012; a parallel hybrid modeling strategy. Ind Eng Chem Res. 2021;60(6):
45:15-26. 2547-2562.
52. Peroni CV, Parisi M, Chianese A. Hybrid modelling and self-learning 74. Hanachi H, Yu W, Kim IY, Liu J, Mechefske CK. Hybrid data-driven
system for dextrose crystallization process. Chem Eng Res Des. 2010; physics-based model fusion framework for tool wear prediction. Int
88(12):1653-1658. J Adv Manuf Technol. 2019;101(9):2861-2872.
53. Xiong Z, Zhang J. A batch-to-batch iterative optimal control strategy 75. Babanezhad M, Behroyan I, Nakhjiri AT, Marjani A, Rezakazemi M,
based on recurrent neural network models. J Process Control. 2005; Shirazian S. High-performance hybrid modeling chemical reactors
15(1):11-21. using differential evolution based fuzzy inference system. Sci Rep.
54. Zhang S, Chu F, Deng G, Wang F. Soft sensor model development 2020;10(1):1-1.
for cobalt oxalate synthesis process based on adaptive Gaussian 76. Krippl M, Dürauer A, Duerkop M. Hybrid modeling of cross-flow fil-
mixture regression. IEEE Access. 2019;7:118749-118763. tration: predicting the flux evolution and duration of ultrafiltration
55. Nentwich C, Winz J, Engell S. Surrogate modeling of fugacity coeffi- processes. Sep Purif Technol. 2020;248:117064.
cients using adaptive sampling. Ind Eng Chem Res. 2019;58(40): 77. Mantovanelli IC, Rivera EC, Da Costa AC, Maciel FR. Hybrid neural
18703-18716. network model of an industrial ethanol fermentation process consid-
56. Kunde C, Keßler T, Linke S, McBride K, Sundmacher K, Kienle A. ering the effect of temperature. Appl Biochem Biotechnol. 2007;
Surrogate modeling for liquid–liquid equilibria using a parameteriza- 137(1):817-833.
tion of the binodal curve. Processes. 2019;7(10):753. 78. Sharma N, Liu YA. 110th anniversary: an effective methodology for
57. Côte M, Grandjean BP, Lessard P, Thibault J. Dynamic modelling of kinetic parameter estimation for modeling commercial polyolefin
the activated sludge process: improving prediction using neural net- processes from plant data using efficient simulation software tools.
works. Water Res. 1995;29(4):995-1004. Ind Eng Chem Res. 2019;58(31):14209-14226.
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 19 SHARMA AND LIU
79. Bangi MS, Kwon JS. Deep hybrid modeling of chemical process: 101. Proppe J, Husch T, Simm GN, Reiher M. Uncertainty quantification
application to hydraulic fracturing. Comput Chem Eng. 2020;134: for quantum chemical models of complex reaction networks. Fara-
106696. day Discuss. 2017;195:497-520.
80. Bhutani N, Rangaiah GP, Ray AK. First-principles, data-based, and 102. Parks HL, McGaughey AJ, Viswanathan V. Uncertainty quantifica-
hybrid modeling and optimization of an industrial hydrocracking unit. tion in first-principles predictions of harmonic vibrational frequen-
Ind Eng Chem Res. 2006;45(23):7807-7816. cies of molecules and molecular complexes. J Phys Chem C. 2019;
81. Song W, Du W, Fan C, Yang M, Qian F. Adaptive weighted hybrid 123(7):4072-4084.
modeling of hydrocracking process and its operational optimization. 103. Wang S, Pillai HS, Xin H. Bayesian learning of chemisorption for
Ind Eng Chem Res. 2021;60(9):3617-3632. bridging the complexity of electronic descriptors. Nat Commun.
82. Lima PV, Saraiva PM, Group GP. A semi-mechanistic model building 2020;11(1):6132.
framework based on selective and localized model extensions. Com- 104. Boukouvala F, Ierapetritou MG. Feasibility analysis of black-box pro-
put Chem Eng. 2007;31(4):361-373. cesses using an adaptive sampling kriging-based method. Comput
83. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. Chem Eng. 2012;36:358-368.
84. Savkovic-Stevanovic J. Neural net controller by inverse modeling for 105. Ghenis M. Quantile regression—from linear regression to deep
a distillation plant. Comput Chem Eng. 1996;20:S925-S930. learning. Accessed August 5, 2021. https://towardsdatascience.
85. Tomba E, Barolo M, García-Muñoz S. In-silico product formulation com/quantile-regression-from-linear-models-to-trees-to-deep-learning-
design through latent variable model inversion. Chem Eng Res Des. af3738b527c3
2014;92(3):534-544. 106. Rudy SH, Brunton SL, Proctor JL, Kutz JN. Data-driven discovery of
86. Bayer B, von Stosch M, Striedner G, Duerkop M. Comparison of partial differential equations. Sci Adv. 2017;3(4):e1602614.
modeling methods for DoE-based holistic upstream process charac- 107. Langley P, Bradshaw GL, Simon HA. Rediscovering chemistry with
terization. Biotechnol J. 2020;15(5):1900551. the BACON system. Machine Learning. Springer; 1983:307-329.
87. Raccuglia P, Elbert KC, Adler PD, et al. Machine-learning-assisted 108. Nentwich C, Engell S. Surrogate modeling of phase equilibrium cal-
materials discovery using failed experiments. Nature. 2016; culations using adaptive sampling. Comput Chem Eng. 2019;126:
533(7601):73-76. 204-217.
88. Liao TW, Li G. Metaheuristic-based inverse design of materials—a 109. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations
survey. J Materiom. 2020;6(2):414-430. from data by sparse identification of nonlinear dynamical systems.
89. Zhou ZH. Ensemble methods: foundations and algorithms. Machine Proc Natl Acad Sci. 2016;113(15):3932-3937.
Learning. Chapman & Hall/CRC; 2012. 110. Wang SH, Pillai HS, Wang S, Achenie LE, Xin H. Infusing theory into
90. MacGregor JF, Skagerberg B, Kiparissides C. Multivariate statistical machine learning for interpretable reactivity prediction. arXiv pre-
process control and property inference applied to low density poly- print arXiv:2103.15210. 2021.
ethylene reactors. Advanced Control of Chemical Processes. IFAC 111. Chen RT, Rubanova Y, Bettencourt J, Duvenaud D. Neural ordinary
Symposia Series; 1992:155-159. differential equations. arXiv preprint arXiv:1806.07366. 2018.
91. Rogers A, Ierapetritou M. Feasibility and flexibility analysis of black- 112. De Jaegher B, Larumbe E, De Schepper W, Verliefde A, Nopens I.
box processes. Part 1: surrogate-based feasibility analysis. Chem Eng Colloidal fouling in electrodialysis: a neural differential equations
Sci. 2015;137:986-1004. model. Sep Purif Technol. 2020;249:116939.
92. Rogers A, Ierapetritou M. Feasibility and flexibility analysis of black- 113. Wu Z, Rincon D, Christofides PD. Process structure-based recurrent
box processes. Part 2: surrogate-based flexibility analysis. Chem Eng neural network modeling for model predictive control of nonlinear
Sci. 2015;137:1005-1013. processes. J Process Control. 2020;89:74-84.
93. Abdullah F, Wu Z, Christofides PD. Data-based reduced-order 114. Reis MS, Gins G, Rato TJ. Incorporation of process-specific structure
modeling of nonlinear two-time-scale processes. Chem Eng Res Des. in statistical process monitoring: a review. J Qual Technol. 2019;
2021;166:1-9. 51(4):407-421.
94. Agarwal A, Biegler LT, Zitney SE. Simulation and optimization of 115. Hayashi Y, Buckley JJ, Czogala E. Fuzzy neural network with fuzzy
pressure swing adsorption systems using reduced-order modeling. signals and weights. Int J Intell Syst. 1993;8(4):527-537.
Ind Eng Chem Res. 2009;48(5):2327-2343. 116. Brown M, Harris CJ. Neurofuzzy Adaptive Modelling and Control.
95. Schäfer P, Caspari A, Kleinhans K, Mhamdi A, Mitsos A. Reduced Prentice Hall; 1994.
dynamic modeling approach for rectification columns based on com- 117. Simutis R, Havlik I, Schneider F, Dors M, Lübbert A. Artificial neural
partmentalization and artificial neural networks. AIChE J. 2019;65(5): networks of improved reliability for industrial process supervision.
e16568. IFAC Proc Vol. 1995;28(3):59-65.
96. Kumari P, Bhadriraju B, Wang Q, Kwon JS. Development of para- 118. Simutis R, Lübbert A. Bioreactor control improves bioprocess perfor-
metric reduced-order model for consequence estimation of rare mance. J Biotechnol J. 2015;10(8):1115-1130.
events. Chem Eng Res Des. 2021;169:142-152. 119. Schubert J, Simutis R, Dors M, Havlik I, Lübbert A. Bioprocess opti-
97. Narasingam A, Kwon JS. Development of local dynamic mode mization and control: application of hybrid modelling. J Biotechnol.
decomposition with control: application to model predictive con- 1994;35(1):51-68.
trol of hydraulic fracturing. Comput Chem Eng. 2017;106: 120. Bhadriraju B, Narasingam A, Kwon JS. Machine learning-based adap-
501-511. tive model identification of systems: application to a chemical pro-
98. Zhang J, Yin J, Wang R. Basic framework and main methods of cess. Chem Eng Res Des. 2019;152:372-383.
uncertainty quantification. Math Probl Eng. 2020;2020:6068203. 121. Bhadriraju B, Bangi MS, Narasingam A, Kwon JS. Operable adaptive
99. Duong PL, Ali W, Kwok E, Lee M. Uncertainty quantification and sparse identification of systems: application to chemical processes.
global sensitivity analysis of complex chemical process using a gen- AIChE J. 2020;66(11):e16980.
eralized polynomial chaos approach. Comput Chem Eng. 2016;90: 122. Lu J, Yao K, Gao F. Process similarity and developing new process
23-30. models through migration. AIChE J. 2009;55(9):2318-2328.
100. Francis-Xavier F, Kubannek F, Schenkendorf R. Hybrid process 123. Yan W, Hu S, Yang Y, Gao F, Chen T. Bayesian migration of Gauss-
models in electrochemical syntheses under deep uncertainty. Pro- ian process regression for rapid process modeling and optimization.
cesses. 2021;9(4):704. Chem Eng J. 2011;166(3):1095-1103.
15475905, 2022, 5, Downloaded from https://aiche.onlinelibrary.wiley.com/doi/10.1002/aic.17609 by Ulsan National Institute Of, Wiley Online Library on [13/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SHARMA AND LIU 19 of 19
124. Kumar A, Ridha S, Narahari M, Ilyas SU. Physics-guided deep neural 128. Cang R, Li H, Yao H, Jiao Y, Ren Y. Improving direct physical proper-
network to characterize non-Newtonian fluid flow for optimal use ties prediction of heterogeneous materials from imaging data via
of energy resources. Expert Syst Appl. 2021;19:115409. convolutional neural network and a morphology-aware generative
125. Pun GP, Batra R, Ramprasad R, Mishin Y. Physically informed artifi- model. Comput Mater Sci. 2018;150:212-221.
cial neural networks for atomistic modeling of materials. Nat
Commun. 2019;10(1):1.
126. Fischer CC, Tibbetts KJ, Morgan D, Ceder G. Predicting crystal
structure by merging data mining with quantum mechanics. Nat How to cite this article: Sharma N, Liu YA. A hybrid science-
Mater. 2006;5(8):641-646. guided machine learning approach for modeling chemical
127. Hautier G, Fischer CC, Jain A, Mueller T, Ceder G. Finding
processes: A review. AIChE J. 2022;68(5):e17609.
nature's missing ternary oxide compounds using machine learn-
ing and density functional theory. Chem Mater. 2010;22(12):
doi:10.1002/aic.17609
3762-3767.