Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Received: 21 August 2023 | Revised: 22 December 2023 | Accepted: 22 January 2024

DOI: 10.1002/bit.28668

ARTICLE

Deep hybrid modeling of a HEK293 process: Combining long


short‐term memory networks with first principles equations

João R. C. Ramos1 | José Pinto1 | Gil Poiares‐Oliveira1 | Ludovic Peeters2 |


Patrick Dumas2 | Rui Oliveira1

1
LAQV‐REQUIMTE, Department of
Chemistry, NOVA School of Science and Abstract
Technology, NOVA University Lisbon,
The combination of physical equations with deep learning is becoming a promising
Caparica, Portugal
2
GSK, Rixensart, Belgium
methodology for bioprocess digitalization. In this paper, we investigate for the first
time the combination of long short‐term memory (LSTM) networks with first
Correspondence
principles equations in a hybrid workflow to describe human embryonic kidney 293
Rui Oliveira, LAQV‐REQUIMTE, Department
of Chemistry, NOVA School of Science and (HEK293) culture dynamics. Experimental data of 27 extracellular state variables in
Technology, NOVA University Lisbon, 2829‐ 20 fed‐batch HEK293 cultures were collected in a parallel high throughput 250 mL
516 Caparica, Portugal.
Email: rmo@fct.unl.pt cultivation system in an industrial process development setting. The adaptive
moment estimation method with stochastic regularization and cross‐validation were
Funding information
SFRD, Grant/Award Number: employed for deep learning. A total of 784 hybrid models with varying deep neural
BD14610472019; Fundação para a Ciência e network architectures, depths, layers sizes and node activation functions were
Tecnologia (FCT)
compared. In most scenarios, hybrid LSTM models outperformed classical hybrid
Feedforward Neural Network (FFNN) models in terms of training and testing error.
Hybrid LSTM models revealed to be less sensitive to data resampling than FFNN
hybrid models. As disadvantages, Hybrid LSTM models are in general more complex
(higher number of parameters) and have a higher computation cost than FFNN
hybrid models. The hybrid model with the highest prediction accuracy consisted in a
LSTM network with seven internal states connected in series with dynamic material
balance equations. This hybrid model correctly predicted the dynamics of the 27
state variables (R2 = 0.93 in the test data set), including biomass, key substrates,
amino acids and metabolic by‐products for around 10 cultivation days.

KEYWORDS
deep hybrid modeling, deep learning, feedforward neural networks, HEK293 cells, long short‐
term memory network, machine learning

João R. C. Ramos and José Pinto contributed equally to this work.

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium,
provided the original work is properly cited.
© 2024 The Authors. Biotechnology and Bioengineering published by Wiley Periodicals LLC.

Biotechnol Bioeng. 2024;1–15. wileyonlinelibrary.com/journal/bit | 1


10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 | RAMOS ET AL.

1 | INTRODUCTION composition. Teixeira and co‐authors applied a similar serial hybrid


modeling strategy for BHK‐21 cells expressing a fusion glycoprotein
Human embryonic kidney 293 (HEK293) is a well‐known immortal- IgG1‐IL2 (Teixeira et al., 2005; Teixeira et al., 2007). Aehle and co‐
ized mammalian cell line of human origin. A major advantage in authors developed a serial hybrid model (FFNN + material balances)
relation to other mammalian cell lines, such as Chinese Hamster for on‐line estimation of viable cell concentration in fed‐batch CHO
Ovary (CHO) cells, is the ability to express proteins with human Post cultures (Aehle et al., 2010). Narayanan and co‐authors developed a
translational modifications. Research on these cells has taken major serial hybrid model (FFNN connected to material balances) for a CHO
strides as scale‐up methods have been developed to make protein fed‐batch process (81 batches in a 3.5 L bioreactor) (Narayanan
expression economically competitive (Dumont et al., 2016; Portolano et al., 2019). Bayer and co‐authors applied the same serial hybrid
et al., 2014; Swiech et al., 2012). Over the last few years, HEK293 modeling strategy to analyze data obtained by intensified design of
were used in the manufacturing of several vaccine products and experiments to reduce the validation burden of CHO cultures in a
candidates against SARS‐CoV‐2 in response to the COVID‐19 biopharma quality‐by‐design context (Bayer et al., 2021).
pandemic (Ren et al., 2020; Sanchez‐Felipe et al., 2021; van The neural networks research field is rapidly evolving towards
Doremalen et al., 2020). complex multilayered data representation architectures (Alzubaidi
Given the ubiquitousness of the HEK293 cell line in academia et al., 2021). Multilayered (deep) FFNNs were proven to better
and industry, the development of reliable dynamic models to support approximate nonlinear functions than three‐layers (shallow) FFNNs
cell line and process digitalization is of critical importance. Bioprocess (e.g. [Liang & Srikant, 2016]). With a significant delay, hybrid
Digital Twins (DT) based on high‐fidelity mathematical models are modeling is taking the first steps towards the integration of deep
currently under development in many industrial sites (Udugama learning into its framework. The training of deep neural networks
et al., 2021). Mechanistic modeling is a traditional approach to (DNNs) combined with systems of ODEs may be challenging due to
develop DTs (e.g., [Hartmann et al., 2022; Monteiro et al., 2023]). the high central processing unit (CPU) cost associated with the
However, mechanistic models of HEK293 cells combining cell growth computation of gradients by the sensitivity method. Deep FFNNs
kinetics and metabolism are still scarce. Nguyen and co‐authors coupled with systems of ODEs have been recently investigated by
developed a mechanistic kinetic model of recombinant adeno‐ Pinto and co‐authors for bioreactor hybrid modeling (Pinto
associated virus production in HEK293 cells by triple transfection et al., 2022). Deep learning based on adaptive moment estimation
(Nguyen et al., 2021). This model covers the main kinetic steps method (ADAM) (Kingma & Ba, 2014) and modified semi‐direct
starting from exogenous DNA delivery to the reaction cascade that sensitivity equations was shown to systematically improve the
forms viral proteins and DNA. Joiner and co‐authors have reviewed predictive power of deep hybrid models over their shallow counter-
modeling approaches of recombinant adeno‐associated virus produc- parts (Pinto et al., 2022).
tion in HEK293 cells (Joiner et al., 2022). Helgers and co‐authors A promising network architecture for bioprocess dynamic
developed a dynamic model to describe cell growth, metabolism modeling is the LSTM network originally proposed by Hochreiter &
(glucose, lactate, ammonium and amino acids) and HIV virus‐like Schmidhuber, 1997. LSTMs are a particular type of recurrent
particle production in fed‐batch cultivations (Helgers et al., 2022). neural networks (RNNs), that use several gated units (multilayer
With the emergence of machine learning and deep learning in cell structure) and a cell state layer, with the ability to approximate
recent years, bioprocess digitalization approaches that rely on this complex dynamics (Smagulova & James, 2019). Hansen and co‐
type of models are gaining popularity (Helleckes et al., 2023; authors recently applied LSTMs for modeling phosphorous
Mowbray et al., 2021; Mowbray et al., 2022). Machine learning removal in a wastewater treatment plant characterized by complex
relies however on big data resources that are still lagging in the mixed microbial dynamics (Hansen et al., 2022). Cheng and co‐
bioprocess industries (Udugama et al., 2021). A promising approach is authors reported a multilayered hybrid modeling workflow for a
to combine machine learning with mechanistic knowledge in hybrid wastewater treatment process that combined a mechanistic model,
modeling workflows. The combination of both approaches may a convolutional neural network (CNN), a LSTM and a FFNN (Cheng
increase the predictive power of models, improve model transpar- et al., 2021). In this paper, we extend the deep hybrid modeling
ency and may decrease the data dependency in relation to purely framework proposed by (Pinto et al., 2022) to LSTMs combined in
data driven models (Agharafeie et al., 2023; Cuperlovic‐Culf series with dynamic material balance equations. Deep hybrid
et al., 2023; Helleckes et al., 2022; Mukherjee & Bhattacharyya, 2023; models based on LSTMs were systematically compared with
von Stosch et al., 2014). Fu and Barford applied the hybrid modeling classical hybrid models based on FFNNs. The proposed hybrid
approach to a hybridoma cell line 6BB expressing a monoclonal modeling framework was applied to describe HEK293 fed‐batch
antibody (Fu & Barford, 1996). The hybrid model consisted of culture dynamics. Experimental data collected in a parallel
bioreactor material balance equations (system of ordinary differential bioreactor system were used to train the hybrid models. To the
equations (ODEs)) coupled with a feedforward neural network best of our knowledge, this the first study comparing hybrid LSTM
(FFNN) to predict the kinetics of substrate consumption, by‐ structures with the classical hybrid FFNN structures for bioprocess
product accumulation, cell growth, product formation as well as cell modeling.
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 3

2 | MATERIALS AND METHODS every fed‐batch experiment staked vertically. The columns represent
the 27 measured bioreaction compounds (viable cell count, Xv,
2.1 | Cell culture and analytics glucose, Glc, Lactate, Lac, Glutamine, Gln, Pyruvate, Pyr, Glutamate,
Glu, Ammonium, NH4, Alanine, Ala, Arginine, Arg, Asparagine, Asn,
A GSK proprietary HEK293 cell line and chemically defined culture Aspartate, Asp, Citrate, Cit, Cysteine, Cys, Formic acid, For, Glycerol,
medium were used for the cell count expansion. Precultures of the Glyc, Histidine, His, Isoleucine, Ile, Leucine, Leu, Lysine, Lys,
cell line were grown in shake‐flasks (500 mL) at 37°C and 5% CO2 Methionine, Met, Phenylalanine, Phe, Proline, Pro, Serine, Ser,
atmosphere with a shaking frequency of 160 rpm (140 mL of culture). Threonine, Thr, Tryptophan, Trp, Tyrosine, Tyr and Valine, Val). A
GSK aims at a targeted process scheme that consists of a growth normalized matrix, IR̃ , was obtained by dividing each IR column by the
phase (6–7 days) and adenovirus virus production phase (3–4 days). respective maximum absolute value, IRmax ,
The goal of the current set of cultures was to optimize cell growth
only. Although cultures were maintained up to 10 days, no viral IR̃ = IR ⊘ IRmax , (1)
infection was performed. In total, 20 cell cultures were carried out in
250 mL vessels (Ambr systems), with initial seed of 0.4 Mcell/mL. The with ⊘ the Hadamard division. In the next step, principal component
pH was controlled at 7.2 with NaHCO3 7.5% and sparged CO2 analysis (PCA) was applied to decompose IR̃ in a matrix of scores,
together with overlay aeration. The dissolved oxygen (DO) was Sco , and a matrix of coefficients, Coeff .
controlled at 40% by sparging pure oxygen. Stirring was adjusted to
around 20 W/m3. The cultivations were initiated in batch mode and IR̃ = Sco × CoeffT . (2)
switched to fed‐batch mode after 48 h. The basal medium was the
same in all cultivations, but feeding solutions (11 unique solutions) The alternating least‐squares PCA algorithm was adopted in
applied for each cultivation changed from one culture to another. The MATLAB® for this purpose (function “pca” with option alternating
different feeding solutions consisted of mixtures of amino acids, least‐squares (ALS)). Finally, the reaction correlation matrix was
glucose (Glc), glutamine (Gln), pyruvate (Pyr), vitamins and other obtained by denormalization of the matrix of coefficients,
micronutrients such as selenium and magnesium dichloride (MgCl2).
S = Ccoeff ⊗ IRmax, (3)
The feeding mixtures were designed using statistical design of
experiments (DoE) such as to “excite” the biological system with very with ⊗ the Hadamard multiplication. The matrix S contains
diverse process conditions and to collect an information‐rich data set. correlation information between consumption and or production of
Feedings were carried once a day in a quasi‐simultaneous addition of bioreaction compounds.
a bolus of all the feeding solutions involved. It had a constant daily Inclusion of the reaction correlation matrix significantly improves
feed volume of 8 mL, with an additional glucose feed provided in case the model parsimony and predictive power (preliminary study in
its concentration was expected to deplete in the following day (based Supporting Information S1).
glucose concentration and on a cell‐specific consumption rate).
Sampling was performed daily, where the viable cell density (VCD)
and viability were measured using a Vi‐Cell cell counter (Beckman, 2.3 | Deep hybrid modeling method
Indianapolis, USA). Samples were also assayed for Glc, lactate (Lac),
Pyr, Gln, ammonium (NH4) and lactate dehydrogenase (LDH) with a The models applied to the HEK293 fed‐batch cultures are based on
CedexBio‐HT metabolite analyzer (Roche). The remaining metabolites the deep hybrid modeling framework proposed by Pinto and co‐
and amino acids were assayed off‐line by Nuclear Magnetic authors (Pinto et al., 2022). It consists of a deep neural network
Resonance spectroscopy (NMR) at Eurofins Spinnovation (Oss, The connected in series with a system of ordinary differential (ODEs)
Netherlands). According to previous calibrations, the measurement equations (Figure 1). The ODEs were derived from material balance
error is approximately 10% for VCD and 20% for the metabolites. equations of the 27 biochemical species contained in a perfectly
mixed bioreactor compartment, taking the following general form:

dC
2.2 | Reaction correlation matrix = SrXv − DC + DCin, (4)
dt

The reaction correlation matrix, S , was inferred from cell culture data with C a vector of 27 concentrations, S a (27 × 7) reaction
using the state‐space reduction method proposed by Pinto and correlation matrix, r a (7 × 1) vector of specific reaction rates, Xv
coauthors (Pinto et al., 2023c). Briefly, the data of concentrations, the viable cells concentration, D the dilution rate and Cin a (27 × 1)
feeding and sampling volumes were used to estimate the cumulative vector of concentrations in the feed stream. Due to the discrete time
reacted amount over time of each species in the 20 fed‐batch implementation of the LSTM (described below), Equation 4 was
experiments (described in Supporting Information S1). The cumula- implemented in discrete time as follows,
tive reacted amount of each species was organized in a two‐
dimensional data matrix, IR . The rows represent the process time of C (t + 1) = C (t) + Sr (t) Xv (t) δt − D (t) C (t) δt + D (t) Cin δt , (5)
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 | RAMOS ET AL.

A peephole LSTM hidden layer ((Gers et al., 2000; Hochreiter &


Schmidhuber, 1997)) was also implemented as alternative to feedforward
layers. A peephole LSTM layer at hidden position k has inputs, x {k } (t),
outputs, y {k } (t) and additionally the internal state vector, z {k } (t),with
dim(z {k } ) = dim( y {k } ). The peephole LSTM layer has an internal structure
consisting of 4 interconnected gate layers, namely a forget gate layer,

hf {k } (t) = Sig (wf {k } x {k } (t) + uf {k } z {k } (t) + bk {k } ), (7a)

an input gate layer,

hi {k } (t) = Sig (wi {k } x {k } (t) + ui {k } z {k } (t) + bi {k } ), (7b)

an output gate layer,

ho{k } (t) = Sig (wo{k } x {k } (t) + uo{k } z {k } (t) + bo{k } ), (7c)

F I G U R E 1 Deep hybrid model structure. (a) Concentrations of and a cell state layer,
extracellular compounds, C (t), are used as input to the DNN. The
DNN computes the specific reaction kinetics at different time points, hz {k } = Tanh (wz {k } x {k } (t) + bz {k } ), (7d)
r (t). The reaction rates for all species were calculated using the
reaction correlation matrix (S ) and are used in ODEs to calculate with hf {k }, hi {k }, ho{k } and hz {k } denoting the output vectors of the
dynamic concentrations (C (t)). This process is repeated sequentially, respective gate layers with size dim( y {k } ) and {wf {k }, uf {k }, bf {k } ,
wherein estimated concentrations are used in a feedback loop as an wi {k }, ui {k }, bi {k }, wo{k } , uo{k } , bo{k } , wz {k }, bz {k } } the network weights that
input to the hybrid model. (b) peephole LSTM network to compute
need to be optimized during the training. The internal state, z (t), has a
specific reaction rates as a function of species concentrations.
dynamic update rule defined as,

z {k } (t) = hf {k } (t)⊗ z {k } (t − 1) + hi {k } (t)⊗ hz {k } (t), (7e)


with δt the discretization time step. The material balance Equation 5′
was abbreviated as ODE(27) in the hybrid model specification. with z {k } (0) = 0 . The LSTM outputs are finally calculated as follows,
The specific kinetic rates, r (t) , were modeled by a DNN as a
function of cultivation time and concentrations of species, C (t). y {k } (t) = ho{k } (t) ⊗ z {k } (t), (7f)
DNNs were implemented as a sequence of nh + 2 interconnected
layers, starting with an input layer, followed by nh hidden layers and a A peephole LSTM layer at hidden position k and with ny outputs
final output layer. The input layer, abbreviated as In(27), always had is abbreviated as LSTM(ny ) with the inputs given by the outputs of
27 inputs corresponding to concentrations, C (t),normalized by their the preceding layer, x {k } (t) = y {k −1} (t).
maximum value, Cmax , The DNN structure is finalized with the output layer correspond-
ing to the specific reaction rates (always 7), abbreviated as rates (7),
I (t) = C (t) ⊘ Cmax . (5′)

r (t) = y {nh} (t). (8)


A feedforward layer at hidden positions k = 1, …, nh takes the
following general form, Since there is no generic way to choose the optimal size of a
DNN, several configurations (784 hybrid model configurations) were
y {k } (t) = σ {k } (w {k } ∙x {k } (t) + b{k } ), (6)
evaluated with different sequences of layers (either feedforward or
with y {k } denoting the output vector of hidden layer k , x {k } the input peephole LSTMs), with varying depth and varying number of nodes in
vector of hidden layer k , σ {k } (∙) the hidden layer activation function, the hidden layers. When all hidden layers are of the feedforward
{w {k } , b{k } } the weights associate with hidden layer k (calculated during type, the resulting model is classified as a FFNN hybrid model. When
the training process). The inputs of hidden layer k are always equal to at least one of the hidden layers is a peephole LSTM, the resulting
the outputs of hidden layer k − 1, x {k } (t) = y {k −1} (t) except for hidden model is classified as a LSTM hybrid model.
layer k = 1, which is excited by the input layer, x {1} (t) = I (t). The
activation functions, σ {k } (∙), were either the Sigmoidal function (Sig),
hyperbolic tangent function (Tanh), linear function (Lin) or rectified 2.4 | Deep learning method
linear function (ReLU). A feedforward hidden layer with ny outputs is
abbreviated as Sig(ny ), Tanh(ny ), Lin(ny ) or ReLU(ny ) depending on The raw data acquired in the Ambr® system was preprocessed to a
the activation function. suitable format for training the hybrid models (details in the
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 5

Supporting Information S1). The data of measured concentrations, then performed a one‐step ahead prediction of the concentrations,
feeding and sampling events were organized in 3D matrices of C (t + 1). The C (t + 1) concentrations were fed back to the DNN
concentrations (C ), and control inputs (U ) (Figure 2). The dimension 1 inputs (feedback loop shown in Figure 2). The error between
was cultivation time with a fixed time grid between 0 and 240 h and measured and predicted concentrations was minimized by weighted
24h frequency (linear interpolation was used to synchronize variables least squares regression (Figure 2).
to a common grid for all experiments). Dimension 2 was the 27 A stochastic gradient descent algorithm based on ADAM was
extracellular species. Dimension 3 was 1–20 independent bioreactor adopted to minimize the weighted mean squared error (WMSE) of
experiments. The data were partitioned in a training (60%, 12 the training data partition (see [Pinto et al., 2022] for details). The
experiments, 3564 data points), validation (20%, 4 experiments, 1188 ADAM method estimates the network parameters, ω, through
data points) and a testing subset (20%, 4 experiments, 1188 data the first and second moments of the gradients of the loss function
points). Data partitioning was performed along dimension 3, that is (the WMSE) and a set of hyperparameters α , β1 and β2 , representing
experiment‐wise. Ten random data permutations training (12)/ the step size and exponential decay of the moments estimation.
validation (4)/testing (4) were applied to each model structure with Default hyperparameter values were adopted, namely ε= 1 × 10−8,
the corresponding modeling results analyzed statistically. The β1 = 0.9 and β2 = 0.999 and α = 1 × 10−3 (Kingma & Ba, 2014) as
MATLAB function ‘randsample’ without replacement was adopted these values were found to work well for hybrid model training (Pinto
to randomly select from the uniform distribution the 10 data et al., 2022; Pinto et al., 2023a; 2023b). The semidirect sensitivity
permutations. This step was done only once to generate a single equations were adopted to compute the gradients (see (Pinto
set of permutations that were applied to all models to ensure et al., 2022), for details). Stochastic regularization (weights dropout)
comparability. The purpose was to avoid a data sampling bias effect, and cross‐validation were implemented to avoid overfitting. Neural
that is, to expose the models to different randomly selected data network nodes and respective weights were randomly dropped out
partitions and to assess the resulting performances statistically. at each training cycle (random selection from the uniform distribution
In the implemented training scheme (Figure 2), the DNN was with weights dropout probability between 0 and 0.45). The choice of
trained coupled with the material balance equations. The inputs to weights dropout probability is discussed in the results section. Cross‐
the DNN were the model calculated concentrations, C (t). The DNN validation was also implemented whereby the final weights were
then calculated the specific rates, r (t). The material balance equations chosen at the iteration with minimal validation WMSE. The corrected
Akaike Criterion (AICc) was calculated to evaluate the final trained
model parsimony (tradeoff between training error and model
complexity). Further details on the training methodology are provided
in the Supporting Information S1. The study was performed in
MATLAB® in computers equipped with an Intel® Core i9‐9900K CPU
with 8 cores, 16 threads, maximum frequency of 5 GHz. The
MATLAB® code is provided as a GitHub repository in the following
link: https://github.com/jrcramos/Hybrid-modeling-of-bioreactor-
with-LSTM.

3 | RESULTS AND DISCUSS I ON

3.1 | Reaction correlation matrix

Bioreactor hybrid models typically include a stoichiometric matrix of


the cellular metabolism, S . In a problem where a reliable metabolic
network exists, S may be derived mechanistically from the metabolic
network stoichiometric coefficients. In this study, the stoichiometric
F I G U R E 2 Supervised deep learning method. Measured data of
matrix was replaced by a reaction correlation matrix inferred from the
concentrations is assumed to be corrupted by gaussian noise. The
deep hybrid models are trained in a weighted least squares sense, cultivation data by applying the previously described method. The
whereby the WMSE between measured and predicted alternating least‐squares PCA algorithm was applied to the cumula-
concentrations of the 27 extracellular species is minimized. The tive reacted amounts data matrix for a maximum number of seven
weighting factors were the inverse of measurement variance at each principal components (PCs). The overall results are summarized in
data point. The semidirect sensitivity method is employed to
Figure 3. Figure 3a shows that one principal PC captures over 92% of
compute gradients of residuals in relation to DNN weights (w ). The
ADAM deep learning algorithm is used to optimize DNN weights (w ) data variance whereas 7 PCs cumulatively explain over 99% of data
in a weighted least squares sense. Weights dropout and cross‐ variance. These results evidence strong correlations between the
validation are employed to avoid overfitting. consumption and/or production of the 27 measured compounds. The
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 | RAMOS ET AL.

F I G U R E 3 PCA of reacted amounts of 20 reactor experiments (6480 time points) and 27 extracellular rates (process descriptors). Data was
normalized column wise by dividing by the maximum absolute value of reacted amount. (a) Left axis explained variance by each PC and right axis
the cumulative explained variance over PC number. (b) Biplot of PC–2 (2.5% explained variance) over PC–1 (92.6% explained variance). Red dots
represent score values. Blue vectors represent the coefficients of process descriptors.

PCs may be interpreted as lumped metabolic pathways whereby overfitting propensity may differ between these structures. It was
extracellular substrates are converted to by‐products. Figure 3b preliminarily assessed the effect of weights dropout probability, cross‐
exemplifies the biplot of coefficients and scores along the directions validation and of the total number of iterations. For this purpose, two
of PC‐1 and PC‐2. There seems to be a dominant lumped pathway, fixed hybrid model configurations based on a FFNN (In(27)‐ReLU(14)‐
along the direction of PC‐1, whereby a large set of substrates (left ReLU(7)‐Rate(7)‐ODE(27)) and on a LSTM (In(27)‐Tanh(14)‐LSTM(7)‐
quadrants) are consumed for biomass and byproducts production Rate(7)‐ODE(27)) were adopted. The optimal weights dropout probabil-
(right quadrants). High cell growth (high score values in the PC‐1 ity was searched between 0 and 0.45. The number of iterations was
direction) is associated with accumulation of Pro, Ala, Glu, Glyc, NH4, fixed to a sufficiently large number (4 × 104) to ensure convergence. The
For and Lac. The PC‐2 expresses variation in nutrients uptake and overall results are summarized in Figure 4. The dropout probability had a
corresponding byproducts production with negligible biomass forma- more pronounced effect on the average WMSE of the LSTM structure.
tion. For instance, Lac may accumulate to a different extent The lowest test WMSE for the LSTM network was obtained with
(depending on the substrates feeding profile) with negligible impact dropout probability of 0.1, while for the FFNN the same results were
on biomass formation. Moreover, Lac accumulation seems to be obtained for dropout probability between 0 and 0.1. For this reason, a
negatively correlated with all other byproducts, particularly Pro. By fixed weights dropout probability of 0.1 was adopted in every test
taking the seven PCs all together, the reaction correlation matrix, performed irrespective of structure. Figure 4c,d show the training,
S (27 × 7), was built from the respective denormalized coefficients. validation and testing WMSE over the training cycle using the selected
The effect of including the reaction correlation matrix in the hybrid parameter (average values among 10 repetitions with different
model structure was preliminary investigated. The main outcome was permutations of train/validation/test experiments). These results show
that with seven PCs the testing error and the AICc were significantly that 4×104 iterations are enough for training because the lowest
reduced whereas the training error was not significantly affected average validation WMSE was always achieved early (<1 × 104
(study in Supporting Information S1). The compression to seven PCs iterations) in the training cycle. Despite the very high CPU time cost,
did not compromise the fitting power of the model. Based on this, all 4 × 104 iterations were used for the remaining studies to ensure that the
hybrid models investigated in the proceeding sections included the cross‐validation minimum is always spotted for every FFNN‐ or LSTM‐
reaction correlation matrix, S (27 × 7), in their structure. hybrid structures investigated irrespective of complexity. This is
essential to ensure a fair comparison between both approaches. In
future studies, the CPU time to train LSTM‐hybrid structures may be
3.2 | Calibration of the training method reduced by employing stochastic regularization based on minibatch size
and weights dropout in replacement of cross‐validation (Pinto
Hybrid model structures were trained with the same ADAM method et al., 2022).
with default hyperparameters, weights dropout regularization, semidir-
ect sensitivity equations (to compute gradients) and cross‐validation.
The training was always repeated 10 times with random permutations 3.3 | Searching for the optimal hybrid model
of training (12), validation (4) and testing (4) experiments. The data configuration
resampling was kept the same for every hybrid structure to ensure
comparability. Cross‐validation is time consuming but allows a fair A total of 784 hybrid configurations based on FFNNs or LSTMs were
comparison between FFNN‐ and LSTM‐based hybrid models as the systematically compared. The number of hidden layers, the number
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 7

F I G U R E 4 Effect of weights dropout regularization in the training, validation, and testing WMSE for a FFNN hybrid model (In(27)‐ReLU(14)‐
ReLU(7)‐Rate(7)‐ODE(27)) and a LSTM hybrid model (In(27)‐Tanh(14)‐LSTM(7)‐Rate(7)‐ODE(27)). (a) Final WMSE as a function of the weight
dropout probability for the FFNN hybrid model. (b) Final WMSE as a function of the weight dropout probability for the LSTM hybrid model. (c)
WMSE over the training iteration for the FFNN hybrid model using weight dropout probability of 0.1. (d) WMSE over the training cycle for the
LSTM hybrid model using weight dropout probability of 0.1.

of nodes in hidden layers and different types of activation functions The complete set of results of the 784 structures are provided in the
were evaluated. In all cases, the first layer was the input layer, In (27), supplementary material (Supporting Information S2).
with 27 inputs of normalized concentrations. It then followed one or Overall, LSTM hybrid models consistently outperformed the
more feedforward hidden layers with ny outputs (either ReLU(ny), FFNN hybrid models in terms of average train and test WMSE over
Tanh(ny) or Lin(ny) depending on the activation function) or a the 10 repetitions. Since FFNNs have comparatively a lower number
peephole LSTM layer (LSTM[ny]). The last layer always had seven of weights, their AICc tends to approximate that of LSTM hybrid
outputs corresponding to the specific reaction rates (Rate[7]). The structures. Thus, the AICc discrimination criterion, which balances
specific rates were then passed to the material balance equations the number of parameters and goodness of the model fit (error), does
(system of ODEs) to calculate the 27 concentrations (ODE[27]). not point to the same conclusions as the minimum test WMSE. In the
LSTM based hybrid structures include at least one LSTM layer case of hybrid FFNN structures, the ReLU activation function
whereas FFNN hybrid structures included one or more feedforward generally outperformed the ones with Tanh. Nevertheless, the FFNN
hidden layers. The “smallest” model with acceptable results was a hybrid structure with the best predictive power comprehended four
shallow hybrid FFNN structure with four hidden nodes. Additional Tanh hidden layers (In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh(7)‐Rate(7)‐
layers were added with sizes 4–27 up to four hidden layers. The ODE(27)). The combination of a ReLU as the first hidden layer
number of weights varied between 147 and 2464. In the case of followed by one LSTM hidden layer yielded the overall best
hybrid LSTM structures, LSTM sizes varied between 4 and 27. Up to predictive power (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)). Two
four stacked LSTMs were investigated. The number of weights varied stacked LSTM layers also achieved a similar result although at the
between 275 and 2977. In all cases, structures with number of cost of higher complexity (Table 1). Stacking LSTM layers gives the
weights larger than training data points were disregarded. Some model the ability to fit complex high‐order functions (Jiang
FFNNs had a complexity (number of weights) comparable to the et al., 2021). In this regard, the model improvement by stacking
LSTM based structures. But in general LSTM based structures tend to LSTM layers or introducing at least one LSTM layer led to similar
have a higher number of weights compared to FFNNs due to their performances. In conclusion, stacking LSTM layers was not advanta-
complex internal structure composed of gate layers. The 784 hybrid geous for this problem, as the complexity (number of weights)
structures were trained with the same pre‐calibrated training method increased without a sensible improvement of the predictive power.
as described in the previous section. The partial results of the top 10 Figure 5 shows the training and testing WMSE and AICc for the
best performing LSTM and FFNN hybrid models are shown in Table 1. best LSTM hybrid model (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27))
| 8

TABLE 1 Training and testing performance for the top 10 best LSTM and FFNN hybrid models (as measured by the lowest test WMSE + 1σ).

Hybrid LSTM structures WMSE train WMSE test AICc Npar Niter Time (s)

In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27) 2.64E‐03 ( ± 7.01E‐04) 5.51E‐03 ( ± 6.40E‐04) −1.82E + 04 ( ± 895) 1071 4.0E + 04 1.59E + 03 ( ± 16.6)

In(27)‐ReLU(11)‐LSTM(7)‐Rate(7)‐ODE(27) 3.04E‐03 ( ± 5.50E‐04) 5.25E‐03 ( ± 9.14E‐04) −1.87E + 04 ( ± 647) 791 4.0E + 04 1.55E + 03 ( ± 11.9)

In(27)‐ReLU(17)‐LSTM(7)‐Rate(7)‐ODE(27) 2.77E‐03 ( ± 7.37E‐04) 5.59E‐03 ( ± 7.76E‐04) −1.78E + 04 ( ± 964) 1127 4.0E + 04 1.47E + 03 ( ± 7.23)

In(27)‐ReLU(26)‐LSTM(7)‐Rate(7)‐ODE(27) 2.61E‐03 ( ± 6.93E‐04) 5.73E‐03 ( ± 1.03E‐03) −1.53E + 04 ( ± 984) 1631 4.0E + 04 1.6E + 03 ( ± 16.4)

In(27)‐Lin(10)‐Lin(10)‐LSTM(7)‐Rate(7)‐ODE(27) 3.60E‐03 ( ± 7.63E‐04) 5.50E‐03 ( ± 1.26E‐03) −1.79E + 04 ( ± 747) 845 4.0E + 04 1.73E + 03 ( ± 11.9)

In(27)‐ReLU(25)‐LSTM(7)‐Rate(7)‐ODE(27) 2.78E‐03 ( ± 5.12E‐04) 5.60E‐03 ( ± 1.29E‐03) −1.54E + 04 ( ± 677) 1575 4.0E + 04 1.47E + 03 ( ± 11.8)

In(27)‐LSTM(15)‐LSTM(7)‐Rate(7)‐ODE(27) 3.31E‐03 ( ± 9.89E‐04) 5.84E‐03 ( ± 1.19E‐03) 1.38E + 04 ( ± 1.19E + 03) 2950 4.0E + 04 2.33E + 03 ( ± 22.3)

In(27)‐ReLU(4)‐LSTM(7)‐Rate(7)‐ODE(27) 4.01E‐03 ( ± 3.41E‐04) 5.83E‐03 ( ± 1.23E‐03) −1.88E + 04 ( ± 309) 399 4.0E + 04 1.38E + 03 ( ± 5.53)

In(27)‐ReLU(14)‐LSTM(7)‐Rate(7)‐ODE(27) 2.91E‐03 ( ± 7.90E‐04) 6.10E‐03 ( ± 9.87E‐04) −1.83E + 04 ( ± 1.04E + 03) 959 4.0E + 04 1.58E + 03 ( ± 16.9)

In(27)‐ReLU(24)‐LSTM(7)‐Rate(7)‐ODE(27) 2.59E‐03 ( ± 7.04E‐04) 5.90E‐03 ( ± 1.25E‐03) −1.6E + 04 ( ± 962) 1519 4.0E + 04 1.57E + 03 ( ± 12.9)

Hybrid FFNN structures WMSE train WMSE test AICc Npar Niter Time (s)

In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh(7)‐Rate(7)‐ODE(27) 4.06E‐03 ( ± 1.40E‐03) 6.36E‐03 ( ± 2.16E‐03) −1.88E + 04 ( ± 1.19E + 03) 431 4.0E + 04 1.22E + 03 ( ± 11.5)

In(27)‐Tanh(6)‐Tanh(6)‐Tanh(6)‐Tanh(7)‐Rate(7)‐ODE(27) 4.03E‐03 ( ± 9.18E‐04) 6.81E‐03 ( ± 1.82E‐03) −1.91E + 04 ( ± 881) 301 4.0E + 04 1.13E + 03 ( ± 7.75)

In(27)‐Tanh(6)‐Lin(6)‐Lin(6)‐Lin(7)‐Rate(7)‐ODE(27) 4.16E‐03 ( ± 1.30E‐03) 7.42E‐03 ( ± 1.75E‐03) −1.9E + 04 ( ± 1.08E + 03) 301 4.0E + 04 1.24E + 03 ( ± 14.4)

In(27)‐ReLU(27)‐ReLU(27)‐ReLU(7)‐Rate(7)‐ODE(27) 3.97E‐03 ( ± 1.51E‐03) 7.13E‐03 ( ± 2.51E‐03) −1.33E + 04 ( ± 1.19E + 03) 1708 4.0E + 04 1.1E + 03 ( ± 13.4)

In(27)‐ReLU(20)‐ReLU(20)‐ReLU(7)‐Rate(7)‐ODE(27) 4.12E‐03 ( ± 1.02E‐03) 7.16E‐03 ( ± 2.71E‐03) −1.64E + 04 ( ± 874) 1127 4.0E + 04 973 ( ± 6.48)

In(27)‐ReLU(25)‐ReLU(25)‐ReLU(7)‐Rate(7)‐ODE(27) 4.09E‐03 ( ± 1.03E‐03) 7.19E‐03 ( ± 2.93E‐03) −1.43E + 04 ( ± 837) 1532 4.0E + 04 965 ( ± 5.52)

In(27)‐ReLU(9)‐ReLU(9)‐ReLU(7)‐Rate(7)‐ODE(27) 4.07E‐03 ( ± 9.19E‐04) 7.22E‐03 ( ± 2.93E‐03) −1.88E + 04 ( ± 799) 412 4.00E + 04 897 ( ± 61.2)

In(27)‐ReLU(19)‐ReLU(19)‐ReLU(7)‐Rate(7)‐ODE(27) 4.07E‐03 ( ± 9.38E‐04) 7.15E‐03 ( ± 3.20E‐03) −1.67E + 04 ( ± 754) 1052 4.0E + 04 1.07E + 03 ( ± 11.4)

In(27)‐ReLU(15)‐ReLU(15)‐ReLU(7)‐Rate(7)‐ODE(27) 3.95E‐03 ( ± 6.49E‐04) 7.31E‐03 ( ± 3.14E‐03) −1.78E + 04 ( ± 586) 772 4.0E + 04 1.07E + 03 ( ± 4.85)

In(27)‐ReLU(26)‐ReLU(26)‐ReLU(7)‐Rate(7)‐ODE(27) 4.58E‐03 ( ± 1.45E‐03) 7.34E‐03 ( ± 3.12E‐03) −1.34E + 04 ( ± 995) 1619 4.0E + 04 1.1E + 03 ( ± 31.1)

Note: Every model was trained 10 times with randomly selected training/validation/testing data partitions. The 10 partitions were kept the same for comparability. Number of iterations (Niter) and Number of
parameters (Npar).
RAMOS
ET AL.

10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 9

F I G U R E 5 Boxplot of the training and testing WMSE and AICc for the best LSTM (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)) and best
FFNN (In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh(7)‐Rate(7)‐ODE(27)) hybrid models. Results from 10 train/test partitions obtained by experiment‐
wise random data resampling. The bar represents the median, the box is the first and third quartile, and the whisker is the minimum and
maximum.

and best FFNN hybrid model (In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh(7)‐ for the FFNN hybrid structure was 31.25 and 30.76 for the train and
Rate(7)‐ODE(27)). The best LSTM hybrid model had a training and testing test WMSE respectively.
−3 −4 −3 −4
WMSE of 2.64 × 10 ± 7.01 × 10 and 5.51 × 10 ± 6.40 × 10 ,
respectively, and an AICc of −1.82 × 104 ± 895. The best FFNN model
had a training and testing WMSE of 4.06 × 10−3 ± 1.40 × 10−3 and 3.4 | Analysis of the best LSTM hybrid model
−3 −3
6.36 × 10 ± 2.16 × 10 , respectively, and an AICc of −1.88 × 10 ± 4

1.19 × 103. The hybrid LSTM model (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)),


Overall, the average training and testing WMSE for the best which achieved the lowest average and dispersion values of the
LSTM hybrid model are around 35% and 13%, respectively, lower training and testing WMSE, was analyzed in more detail. Figure 7
compared to the best FFNN hybrid model (Figure 5 and Table 1). shows the predicted over measured concentrations for each of the
Despite the higher number of weights for the LSTM model compared 27 concentrations individually and for a particular train:validation:test
to the FFNN (1071 and 431 respectively), the average AICc for the partition, namely the one with the lowest train+test WMSE
LSTM model is only 3% higher. Noteworthy, the LSTM hybrid model (Figure 6a). With few exceptions the coefficient of determination
seems to be less affected by the test/validation/train data resam- (R2) of predicted over measured concentrations is higher than 0.85.
pling, as it leads to narrower ranges for the training and testing The only exceptions were Glyc, Pro and Pyr. In the case of Pro, the R2
WMSE (Figure 5 and Table 1). for the testing subset was only 0.61. Pro is one of the metabolites
Figure 6 further details the training results for the best LSTM and with the lowest variation during the cultivation (around 33%
FFNN deep hybrid models. The results of the 10 permutations of difference between initial and final concentrations). Despite the low
train/validation/test are shown along with the predicted over R2, the dynamic prediction of Pro is in general within the
measured concentration for the best set permutation (lowest training experimental error bounds (see below). Of note, the R2 for the
WMSE which is also the lowest overall WMSE for train+test). Overall, viable cell concentration, which is the main product of this culture,
the training was rather successful in both model structures as had a high value of 0.97 and 0.91 for the training and testing data
2
denoted by the linear correlation coefficients (R = 0.972 and 0.968). subsets respectively. As a rule of thumb, the R2 for the training data
2
The LSTM has a slightly higher R for training and test data sets subset tends to be higher than for the testing subset due to some
compared to the FFNN. The results of the 10 permutations also show degree of overfitting. However, several exceptions to this scenario
that LSTM hybrid model seems to be less affected by the were observed, which included Ala, Asp, Glu, Gln, Gly, His, Ile, Lac,
test:validation:train data resampling as evidenced by the coefficient Leu, Lys, Met, NH4 and Thr. These results obviously depend on the
of variation (CV) (Figure 6). The CV of the LSTM hybrid structure was data resampling but they suggest a successful training with mitigated
24.02 and 10.51 for the train and test WMSE respectively. The CV overfitting. Figure 8 further details the dynamic profiles of
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 | RAMOS ET AL.

F I G U R E 6 Training results for the best LSTM (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)) and FFNN (In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh
(7)‐Rate(7)‐ODE(27)) hybrid model structures. (a) WMSE and coefficient of determination (R²) for 10 randomly selected partitions of train
(12):validation (4):test (4) experiments, for the best LSTM hybrid structure. (b) Normalized predicted over experimental concentrations for 27
process variables and 20 reactor experiments, for the best LSTM hybrid structure and the best partition (highlight in A). (c) WMSE and
coefficient of determination (R²) for 10 randomly selected partitions of train (12):validation (4):test (4) experiments, for the best FFNN hybrid
structure. (d) Normalized predicted over experimental concentrations for 27 process variables and 20 reactor experiments for the best FFNN
hybrid structure and the best partition (highlight in c). Blue circles are training data. Cyan circles are cross‐validation data. Green circles are test
data. The gray dotted lines represent the experimental error bounds.

experimental concentrations and corresponding hybrid model pre- NH4 and For varied between 2.62 and 38.67 mM, 0.70–7.21 mM and
dictions for a training and a testing fed‐batch experiment (the 0.15–2.33 mM, respectively in the performed HEK293 experiments.
remaining model predictions for the test cultivations are provided in The hybrid LSTM model efficiently captured such kinetic effects
the supplementary material (Supporting Information S3). The hybrid given the accurate prediction of the viable cell dynamics, particularly
model correctly predicted the concentration dynamics of the 27 the transition between the cell growth and decay phases.
extracellular species, including biomass, key substrates, amino acids
and metabolic by‐products. The predicted dynamic profiles were
always within the error bounds for both the training and testing fed‐ 3.5 | Is the LSTM architecture advantageous in a
batch experiments. The predictive power of the LSTM hybrid model hybrid modeling scheme?
is noteworthy given that fed‐batch dynamics are notoriously difficult
to predict using mechanistic modeling approaches. Mammalian cell Hybrid modeling of mammalian systems, mostly of CHO cultures,
growth kinetics frequently require more complex models accounting has mainly explored the combination of shallow FFNNs and
for inhibitory effects of by‐products such as Lac and NH4 (Pörtner & material balance equations in the form of ODEs (Agharafeie
Schäfer, 1996) and possible several other metabolites that increase et al., 2023). The inclusion of reliable mechanistic equations in the
the cell death rate (Chong et al., 2011). Fed‐batch experiments reach hybrid model generally reduces the data dependency, improves the
higher cell densities compared to batch, and consequently higher predictive power (e.g., (Bayer et al., 2021; Bayer et al., 2022;
concentrations of toxic by‐products are obtained. As example, Lac, Narayanan et al., 2019; Nold et al., 2023; Senger & Karim, 2003),
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 11

F I G U R E 7 Predicted over experimental concentration of the 27 process variables individually for all 20 reactor experiments by the LSTM hybrid
model (In(27)‐ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)) and the best train/validation/test data partition. (Xv) biomass, (Ala) Alanine, (Arg) Arginine, (Asn)
Asparagine, (Asp) Aspartate, (Cit) Citrate, (Cys) Cysteine, (For) Formic acid, (Glc) Glucose, (Glu) Glutamate, (Gln) Glutamine, (Glyc) Glycerol, (His)
Histidine, (Ile) Isoleucine, (Lac) Lactate, (Leu) Leucine, (Lys) Lysine, (Met) Methionine, (NH4) Ammonium, (Phe) Phenylalanine, (Pro) Proline, (Pyr)
Pyruvate, (Ser) Serine, (Thr) Threonine, (Trp) Tryptophan, (Tyr) Tyrosine and (Val) Valine. Training data (blue symbols) and test data (green symbols).
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 | RAMOS ET AL.

F I G U R E 8 Dynamic profiles of experimental concentrations and predicted concentrations by the best LSTM hybrid model (In(27)‐ReLU(16)‐LSTM
(7)‐Rate(7)‐ODE(27)) for each variable individually and for a selected training and testing experiments. (Xv) biomass, (Ala) Alanine, (Arg) Arginine, (Asn)
Asparagine, (Asp) Aspartate, (Cit) Citrate, (Cys) Cysteine, (For) Formic acid, (Glc) Glucose, (Glu) Glutamate, (Gln) Glutamine, (Glyc) Glycerol, (His) Histidine,
(Ile) Isoleucine, (Lac) Lactate, (Leu) Leucine, (Lys) Lysine, (Met) Methionine, (NH4) Ammonium, (Phe) Phenylalanine, (Pro) Proline, (Pyr) Pyruvate, (Ser)
Serine, (Thr) Threonine, (Trp) Tryptophan, (Tyr) Tyrosine and (Val) Valine. Training data (blue lines and symbols) and test data (green lines and symbols).
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 13

and improves model transferability across different scales (Bayer 4 | CONCLUSIONS


et al., 2021).
For the HEK293 process studied here, the best FFNN hybrid Deep hybrid modeling merging DNNs and first principles equations
model had a deep structure (In(27)‐Tanh(8)‐Tanh(8)‐Tanh(8)‐Tanh(7)‐ was investigated and showcased with a HEK293 fed‐batch process.
Rate(7)‐ODE(27)) with 431 parameters. The final training and testing Hybrid structures based on multilayered FFNNs and LSTMs in many
WMSE were 4.06 × 10−3 ± 1.40 × 10−3 and 6.36 × 10−3 ± 2.16 × 10−3. different configurations (standalone, combined, stacked, with varying
The best shallow FFNN hybrid model (In(27)‐Tanh(7)‐Rate(7)‐ODE depths and layers sizes) were systematically compared. For the
(27)) was rather sensitive to data resampling. It achieved a final HEK293 process studied, the LSTM hybrid models outperformed the
WMSE for training and testing of 6.13 × 1017 ± 9.87 × 1017 and FFNN hybrid models in most of the scenarios tested in terms of
1.84 × 1017 ± 4.08 × 1017 respectively (Supporting Information S3). training and testing error. The best LSTM hybrid structure (In(27)‐
This corroborates the findings of previous studies (Pinto et al., 2022; ReLU(16)‐LSTM(7)‐Rate(7)‐ODE(27)) showed a high predictive power
Pinto et al., 2023a; 2023b; 2023c) showing that deep hybrid of the dynamics of 27 measured extracellular compounds. LSTMs
modeling outperforms shallow hybrid modeling. may learn complex dynamics that mimic the biochemical “memory” of
As for the difference between deep FFNN and deep LSTM hybrid cell populations with varying intracellular composition exposed to a
modes, the results shown in Table 1 and Figures 5 and 6 reveal that that transient environment for long periods of time. Nevertheless, the
the difference between the best LSTM and FFNN hybrid models in inclusion of ODEs of extracellular compounds also confer to FFNN
terms of average training WMSE shows 35 percent of improvement. As hybrid models the capacity to effectively describe extracellular
for the difference in terms of the predictive power (average testing dynamics. Given the results obtained, both FFNN and LSTM
WMSE) the gap is narrower but still significant with 13 percent of networks are good candidates for hybrid modeling. The adoption of
improvement. The training and testing WMSE dispersion (given by the a more complex LSTM structure may justify in such cases where
CV) is in general much lower for the LSTM hybrid model compared to complex population dynamics with strong variations in cell size, gene
the FFNN model (23% and 66% improvement, respectively). This implies copy number and intracellular composition introduce a significant
that LSTM hybrid models are in general more robust to the data dynamic variability to the process.
resampling. Despite these advantages, the LSTM hybrid model generally
requires a larger number of parameters due to its inherent complexity, A UT H O R C O N T R I B U TI O NS
and their training is much slower compared to FFNN hybrid models with All authors were involved in the conception/design of the study or
the same number of weights (around 50% higher CPU time for LSTM the development of the study protocol. Gil Poiares‐Oliveira and
hybrid models). These observations hold for the 784 evaluated Ludovic Peeters were involved in methodology selection and
structures, with the results shown in supplementary material (Support- development. Ludovic Peeters and Patrick Dumas were involved in
ing Information S2). the acquisition of laboratory data. All authors analyzed and
The LSTM structure is known to capture complex dynamics interpreted the data, and were involved in drafting the manuscript
(short‐ and long‐term memory effects). This is the inherent capability or critically reviewing it for important intellectual content. All authors
of LSTM since it has three gates (input, output, forget) which allows it had full access to the data and approved the manuscript before it was
to selectively remember or forget information as such it can capture submitted by the corresponding author.
and remember relevant input data features. In biological context, the
advantage of LSTM hybrid models are likely to be more noticeable in ACKNOWLEDGME NT S
problems with strong population and/or intracellular dynamics. The The authors acknowledge GlaxoSmithKline Biologicals SA for
FFNN hybrid model includes dynamic ODEs of extracellular variables providing the experimental data used in this study. JP acknowledges
but completely disregards population and intracellular dynamics. The PhD grant SFRD/BD14610472019, Fundação para a Ciência e
hybrid LSTM model structure may be particularly effective when cells Tecnologia (FCT), Portugal.
change significantly their size and composition depending on
cultivation conditions. Typically average cell sizes are larger during CONFLIC TS OF I NTERES TS S TATEM ENT
the exponential cell growth phases compared to the transition and LP and PD are employees of the GSK group of companies. The other
cell death phase and this differences changes for each cell line and authors have no competing interests to declare that are relevant to
growth condition (Nielsen et al., 1997). Previous studies on CHO cells the content of this article.
have shown that single cell volume, dry weight and internal
composition may change significantly from the start to the end of a DATA AVAILABILITY STATEMENT
cultivation (Széliová et al., 2020). When the cell properties and The data that support the findings of this study are openly available in
composition significantly change over a cultivation time window, a GitHub at https://github.com/jrcramos/Hybrid-modeling-of-bioreactor-
complex memory effect may be created. In such cases the LSTM with-LSTM.
network offers a structural advantage to capture such complex
dynamics over the FFNN structure, which treat the population of ORC I D
cells and the intracellular phase as a static process. Rui Oliveira http://orcid.org/0000-0001-8077-4177
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 | RAMOS ET AL.

REFERENCES Helgers, H., Hengelbrock, A., Rosengarten, J. F., Stitz, J., Schmidt, A., &
Aehle, M., Simutis, R., & Lübbert, A. (2010). Comparison of viable cell Strube, J. (2022). Towards autonomous process control—digital twin
concentration estimation methods for a mammalian cell cultivation for HIV‐Gag VLP production in HEK293 cells using a dynamic
process. Cytotechnology, 62, 413–422. https://doi.org/10.1007/ metabolic model. Processes, 10, 2015. https://doi.org/10.1007/978-
s10616-010-9291-z 1-4419-9863-7_1251
Agharafeie, R., Ramos, J. R. C., Mendes, J. M., & Oliveira, R. (2023). From Helleckes, L. M., Hemmerich, J., Wiechert, W., von Lieres, E., &
shallow to deep bioprocess hybrid modeling: Advances and future. Grünberger, A. (2022). Machine learning in bioprocess development:
Fermentation, 9, 922. from promise to practice. Trends in Biotechnology, 41(6), 817–835.
Alzubaidi, L., Zhang, J., Humaidi, A. J., Al‐Dujaili, A., Duan, Y., Al‐Shamma, Helleckes, L. M., Hemmerich, J., Wiechert, W., von Lieres, E., &
O., Santamaría, J., Fadhel, M. A., Al‐Amidie, M., & Farhan, L. (2021). Grünberger, A. (2023). Machine learning in bioprocess development:
Review of deep learning: concepts, CNN architectures, challenges, from promise to practice. Trends in Biotechnology, 41, 817–835.
applications, future directions. Journal of Big Data, 8, 53. https://doi. https://linkinghub.elsevier.com/retrieve/pii/S0167779922002815
org/10.1186/s40537-021-00444-8 Hochreiter, S., & Schmidhuber, J. (1997). Long short‐term memory. Neural
Bayer, B., Duerkop, M., Striedner, G., & Sissolak, B. (2021). Model Computation, 9, 1735–1780. https://direct.mit.edu/neco/article/9/
transferability and reduced experimental burden in cell culture 8/1735-1780/6109
process development facilitated by hybrid modeling and intensified Jiang, Y., Wang, D., Yao, Y., Eubel, H., Künzler, P., Møller, I. M., & Xu, D.
design of experiments. Frontiers in Bioengineering and Biotechnology, (2021). MULocDeep: A deep‐learning framework for protein sub-
9, 1275. https://doi.org/10.3389/fbioe.2021.740215/full cellular and suborganellar localization prediction with residue‐level
Bayer, B., Duerkop, M., Pörtner, R., & Möller, J. (2022). Comparison of interpretation. Computational and Structural Biotechnology Journal, 19,
mechanistic and hybrid modeling approaches for characterization of 4825–4839. https://doi.org/10.1016/j.csbj.2021.08.027
a CHO cultivation process: Requirements, pitfalls and solution paths. Joiner, J., Huang, Z., McHugh, K., Stebbins, M., Aron, K., Borys, M., &
Biotechnology Journal, 18, 2200381. https://doi.org/10.1002/biot. Khetan, A. (2022). Process modeling of recombinant adeno‐
202200381 associated virus production in HEK293 cells. Current Opinion in
Cheng, X., Guo, Z., Shen, Y., Yu, K., & Gao, X. (2021). Knowledge and data‐ Chemical Engineering, 36, 100823. https://linkinghub.elsevier.com/
driven hybrid system for modeling fuzzy wastewater treatment retrieve/pii/S2211339822000338
process. Neural Computing and Applications, 35, 7185–7206. https:// Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic
doi.org/10.1007/s00521-021-06499-1 optimization. arxiv, 1, 1–15. http://arxiv.org/abs/1412.6980
Chong, W. P. K., Yusufi, F. N. K., Lee, D.‐Y., Reddy, S. G., Wong, N. S. C., Liang, S., & Srikant, R. (2016). Why deep neural networks for function
Heng, C. K., Yap, M. G. S., & Ho, Y. S. (2011). Metabolomics‐based approximation? http://arxiv.org/abs/1610.04161
identification of apoptosis‐inducing metabolites in recombinant fed‐ Monteiro, M., Fadda, S., & Kontoravdi, C. (2023). Towards advanced
batch CHO culture media. Journal of Biotechnology, 151, 218–224. bioprocess optimization: A multiscale modelling approach.
https://doi.org/10.1016/j.jbiotec.2010.12.010 Computational and Structural Biotechnology Journal, 21, 3639–3655.
Cuperlovic‐Culf, M., Nguyen‐Tran, T., & Bennett, S. A. L. (2023). Machine https://linkinghub.elsevier.com/retrieve/pii/S2001037023002362
learning and hybrid methods for metabolic pathway modeling. Mowbray, M., Vallerio, M., Perez‐Galvan, C., Zhang, D., Del Rio Chanona, A.,
Methods in Molecular Biology, 2553, 417–439. https://doi.org/10. & Navarro‐Brull, F. J. (2022). Industrial data science – A review of
1007/978-1-0716-2617-7_18 machine learning applications for chemical and process industries.
van Doremalen, N., Lambe, T., Spencer, A., Belij‐Rammerstorfer, S., Reaction Chemistry & Engineering, 7, 1471–1509. http://xlink.rsc.org/?
Purushotham, J. N., Port, J. R., Avanzato, V. A., Bushmaker, T., DOI=D1RE00541C
Flaxman, A., Ulaszewska, M., Feldmann, F., Allen, E. R., Sharpe, H., Mowbray, M., Savage, T., Wu, C., Song, Z., Cho, B. A., Del Rio‐Chanona, E.
Schulz, J., Holbrook, M., Okumura, A., Meade‐White, K., Pérez‐ A., & Zhang, D. (2021). Machine learning for biochemical engineer-
Pérez, L., Edwards, N. J., … Munster, V. J. (2020). ChAdOx1 nCoV‐19 ing: A review. Biochemical Engineering Journal, 172, 108054. https://
vaccine prevents SARS‐CoV‐2 pneumonia in rhesus macaques. doi.org/10.1016/j.bej.2021.108054
Nature, 586, 578–582. http://www.nature.com/articles/s41586- Mukherjee, A., & Bhattacharyya, D. (2023). Hybrid series/parallel all‐
020-2608-y nonlinear dynamic‐static neural networks: development, training,
Dumont, J., Euwart, D., Mei, B., Estes, S., & Kshirsagar, R. (2016). Human and application to chemical processes. Industrial & Engineering
cell lines for biopharmaceutical manufacturing: history, status, and Chemistry Research, 62, 3221–3237. https://doi.org/10.1021/acs.
future perspectives. Critical Reviews in Biotechnology, 36, 1110–1122. iecr.2c03339
https://doi.org/10.3109/07388551.2015.1084266 Narayanan, H., Sokolov, M., Morbidelli, M., & Butté, A. (2019). A new
Fu, P.‐C., & Barford, J. P. (1996). A hybrid neural network—first principles generation of predictive models: The added value of hybrid models
approach for modelling of cell metabolism. Computers & Chemical for manufacturing processes of therapeutic proteins. Biotechnology
Engineering, 20, 951–958. https://linkinghub.elsevier.com/retrieve/ and Bioengineering, 116, 2540–2549. https://doi.org/10.1002/bit.
pii/0098135495001905 27097
Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2000). Learning precise Nguyen, T. N. T., Sha, S., Hong, M. S., Maloney, A. J., Barone, P. W.,
timing with LSTM recurrent networks. CrossRef List. Deleted DOIs, 1, Neufeld, C., Wolfrum, J., Springs, S. L., Sinskey, A. J., & Braatz, R. D.
115–143. http://www.crossref.org/deleted_DOI.html (2021). Mechanistic model for production of recombinant adeno‐
Hansen, L. D., Stokholm‐Bjerregaard, M., & Durdevic, P. (2022). Modeling associated virus via triple transfection of HEK293 cells. Molecular
phosphorous dynamics in a wastewater treatment process using Therapy ‐ Methods & Clinical Development, 21, 642–655. https://
Bayesian optimized LSTM. Computers & Chemical Engineering, 160, linkinghub.elsevier.com/retrieve/pii/S2329050121000723
107738. https://linkinghub.elsevier.com/retrieve/pii/ Nielsen, L. K., Reid, S., & Greenfield, P. F. (1997). Cell cycle model to
S0098135422000795 describe animal cell size variation and lag between cell number and
Hartmann, F. S. F., Udugama, I. A., Seibold, G. M., Sugiyama, H., & biomass dynamics. Biotechnology and Bioengineering, 56, 372–379.
Gernaey, K. V. (2022). Digital models in biotechnology: Towards https://doi.org/10.1002/(SICI)1097-0290(19971120)56:4%
multi‐scale integration and implementation. Biotechnology Advances, 3C372::AID-BIT3%3E3.0.CO;2-L
60, 108015. https://linkinghub.elsevier.com/retrieve/pii/ Nold, V., Junghans, L., Bayer, B., Bisgen, L., Duerkop, M., Drerup, R.,
S0734975022001112 Presser, B., Schwab, T., Bluhmki, E., Wieschalka, S., & Knapp, B.
10970290, 0, Downloaded from https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/bit.28668 by Lupin Ltd Lupin Research Park, Wiley Online Library on [12/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RAMOS ET AL. | 15

(2023). Boost dynamic protocols for producing mammalian biophar- Smagulova, K., & James, A. P. (2019). A survey on LSTM memristive neural
maceuticals with intensified DoE—a practical guide to analyses with network architectures and applications. The European Physical
OLS and hybrid modeling. Frontiers in Chemical Engineering, 4, 1–15. Journal Special Topics, 228, 2313–2324. https://doi.org/10.1140/
https://doi.org/10.3389/fceng.2022.1044245/full epjst/e2019-900046-x
Pinto, J., Ramos, J. R. C., Costa, R. S., & Oliveira, R. (2023a). A general von Stosch, M., Oliveira, R., Peres, J., & Feyo de Azevedo, S. (2014).
hybrid modeling framework for systems biology applications: Hybrid semi‐parametric modeling in process systems engineering:
Combining mechanistic knowledge with deep neural networks under Past, present and future. Computers & Chemical Engineering, 60,
the SBML standard. AI, 4, 303–318. https://www.mdpi.com/2673- 86–101. https://doi.org/10.1016/j.compchemeng.2013.08.008
2688/4/1/14 Swiech, K., Picanço‐Castro, V., & Covas, D. T. (2012). Human cells: New
Pinto, J., Ramos, J. R. C., Costa, R. S., & Oliveira, R. (2023c). Hybrid deep platform for recombinant therapeutic protein production. Protein
modeling of a GS115 (Mut+) pichia pastoris culture with state–space Expression and Purification, 84, 147–153. https://linkinghub.elsevier.
reduction. Fermentation, 9, 643. https://www.mdpi.com/2311- com/retrieve/pii/S1046592812001313
5637/9/7/643 Széliová, D., Ruckerbauer, D. E., Galleguillos, S. N., Petersen, L. B.,
Pinto, J., Mestre, M., Ramos, J., Costa, R. S., Striedner, G., & Oliveira, R. Natter, K., Hanscho, M., Troyer, C., Causon, T., Schoeny, H.,
(2022). A general deep hybrid model for bioreactor systems: Christensen, H. B., Lee, D.‐Y., Lewis, N. E., Koellensperger, G.,
Combining first principles with deep neural networks. Computers & Hann, S., Nielsen, L. K., Borth, N., & Zanghellini, J. (2020). What
Chemical Engineering, 165, 107952. https://linkinghub.elsevier.com/ CHO is made of: variations in the biomass composition of
retrieve/pii/S0098135422002897 Chinese hamster ovary cell lines. Metabolic Engineering, 61,
Pinto, J., Ramos, J. R. C., Costa, R. S., Rossell, S., Dumas, P., & Oliveira, R. 288–300. https://linkinghub.elsevier.com/retrieve/pii/S10967
(2023b). Hybrid deep modeling of a CHO‐K1 fed‐batch process: 17620301014
Combining first‐principles with deep neural networks. Frontiers in Teixeira, A., Cunha, A. E., Clemente, J. J., Moreira, J. L., Cruz, H. J.,
Bioengineering and Biotechnology, 11, 1–16. https://doi.org/10. Alves, P. M., Carrondo, M. J. T., & Oliveira, R. (2005). Modelling and
3389/fbioe.2023.1237963/full optimization of a recombinant BHK‐21 cultivation process using
Pörtner, R., & Schäfer, T. (1996). Modelling hybridoma cell growth and hybrid grey‐box systems. Journal of Biotechnology, 118, 290–303.
metabolism — a comparison of selected models and data. Journal of https://linkinghub.elsevier.com/retrieve/pii/S0168165605002312
Biotechnology, 49, 119–135. https://linkinghub.elsevier.com/ Teixeira, A. P., Alves, C., Alves, P. M., Carrondo, M. J., & Oliveira, R. (2007).
retrieve/pii/0168165696015350 Hybrid elementary flux analysis/nonparametric modeling: Applica-
Portolano, N., Watson, P. J., Fairall, L., Millard, C. J., Milano, C. P., Song, Y., tion for bioprocess control. BMC Bioinformatics, 8, 30. https://doi.
Cowley, S. M., & Schwabe, J. W. R. (2014). Recombinant protein org/10.1186/1471-2105-8-30
expression for structural biology in HEK 293F suspension cells: A Udugama, I. A., Öner, M., Lopez, P. C., Beenfeldt, C., Bayer, C.,
novel and accessible approach. Journal of Visualized Experiments: Huusom, J. K., Gernaey, K. V., & Sin, G. (2021). Towards digitalization
JoVE, 92, 1–8. https://www.jove.com/t/51897/recombinant- in bio‐manufacturing operations: A survey on application of big data
protein-expression-for-structural-biology-in-hek-293f-suspension- and digital twin concepts in Denmark. Frontiers in Chemical
cells-a-novel-and-accessible-approach Engineering, 3, 1–14. https://doi.org/10.3389/fceng.2021.
Ren, W., Sun, H., Gao, G. F., Chen, J., Sun, S., Zhao, R., Gao, G., Hu, Y., 727152/full
Zhao, G., Chen, Y., Jin, X., Fang, F., Chen, J., Wang, Q., Gong, S.,
Gao, W., Sun, Y., Su, J., He, A., … Sun, L. (2020). Recombinant SARS‐
CoV‐2 spike S1‐Fc fusion protein induced high levels of neutralizing SUPP ORTING INFO RM ATION
responses in nonhuman primates. Vaccine, 38, 5653–5658. https://
Additional supporting information can be found online in the
linkinghub.elsevier.com/retrieve/pii/S0264410X20308525
Supporting Information section at the end of this article.
Sanchez‐Felipe, L., Vercruysse, T., Sharma, S., Ma, J., Lemmens, V.,
Van Looveren, D., Arkalagud Javarappa, M. P., Boudewijns, R.,
Malengier‐Devlies, B., Liesenborghs, L., Kaptein, S. J. F.,
De Keyzer, C., Bervoets, L., Debaveye, S., Rasulova, M.,
Seldeslachts, L., Li, L.‐H., Jansen, S., Yakass, M. B., … Dallmeier, K. How to cite this article: Ramos, J. R. C., Pinto, J., Poiares‐
(2021). A single‐dose live‐attenuated YF17D‐vectored SARS‐CoV‐2
Oliveira, G., Peeters, L., Dumas, P., & Oliveira, R. (2024). Deep
vaccine candidate. Nature, 590, 320–325. http://www.nature.com/
articles/s41586-020-3035-9 hybrid modeling of a HEK293 process: Combining long short‐
Senger, R. S., & Karim, M. N. (2003). Neural‐network‐based identification term memory networks with first principles equations.
of tissue‐type plasminogen activator protein production and Biotechnology and Bioengineering, 1–15.
glycosylation in CHO cell culture under shear environment.
https://doi.org/10.1002/bit.28668
Biotechnology Progress, 19, 1828–1836. https://doi.org/10.1021/
bp034109x

You might also like