Deep Learning Plasticity

International Journal of Plasticity 136 (2021) 102852
Contents lists available at ScienceDirect
International Journal of Plasticity

journal homepage: http://www.elsevier.com/locate/ijplas
Deep learning for plasticity and thermo-viscoplasticity

Diab W. Abueidda b, a, *, Seid Koric b, a, **, Nahil A. Sobh c, Huseyin Sehitoglu b
a
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, IL, USA
b
Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, IL, USA
c
Beckman Institute and Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, IL, USA
A R T I C L E I N F O A B S T R A C T
Keywords: Predicting history-dependent materials’ responses is crucial, as path-dependent behavior appears

Periodic media while characterizing or geometrically designing many materials (e.g., metallic and polymeric
Temporal convolutional network (TCN) cellular materials), and it takes place in manufacturing and processing of many materials (e.g.,
Recurrent neural network (RNN)
metal solidification). Such phenomena can be computationally intensive and challenging when
Multiphysics
Sequence learning
numerical schemes such as the finite element method are used. Here, we have applied a variety of
sequence learning models to almost instantly predict the history-dependent responses (stresses
and energy) of a class of cellular materials as well as the multiphysics problem of steel solidifi
cation with multiple thermo-viscoplasticity constitutive models accounting for substantial tem
perature, time, and path dependencies, and phase transformation. We have shown the gated
recurrent unit (GRU) as well as the temporal convolutional network (TCN), can both accurately
learn and almost instantly predict these irreversible, and history- and time-dependent phenom
ena, while TCN is more computationally efficient during the training process. This work may open
the door for the broader adoption of data-driven models in similar computationally challenging
constitutive models in plasticity and inelasticity.
1. Introduction
Computational solid mechanics attempts to predict and/or optimize the response of mechanical problems using computer methods,
where such problems appear in engineered or natural systems. Conventional methods utilized to solve partial differential equations
(PDEs) involved in such computational solid mechanics problems are the mesh-free methods (Huerta et al., 2018), finite element
analysis (FEA) (Hughes, 2012), and isogeometric analysis (Hughes et al., 2018). Numerical schemes capable of modeling such
problems can be time-consuming and computationally expensive (e.g. Lee et al., 2017; Cheng et al., 2014; Shrimali et al., 2019; Miehe
et al., 1999; Mahnken et al., 2009; Koric and Thomas, 2006; Ge et al., 2019; Yoon et al., 2018).
In the meantime, current machine learning techniques, such as deep learning, which is inspired from the biological structure and
functioning of a brain, have accomplished significant achievements in broad areas of computer engineering, such as natural language
processing, computer vision, voice recognition, autonomous vehicle driving, and gaming (Goodfellow et al., 2016; Michalski et al.,
2013). The wide popularity of machine learning is due to the parallelization in current computer systems, notable speed enhancement,
and reduction in hardware cost.
Recently, the field of deep learning has been proven to be a promising tool for solving solid mechanics problems (Samaniego et al.,
* Corresponding author. National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, IL, USA.
** Corresponding author. National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, IL, USA.
E-mail addresses: abueidd2@illinois.edu (D.W. Abueidda), koric@illinois.edu (S. Koric).
https://doi.org/10.1016/j.ijplas.2020.102852
Received 26 March 2020; Received in revised form 11 August 2020; Accepted 2 September 2020
Available online 13 September 2020
0749-6419/© 2020 Elsevier Ltd. All rights reserved.
D.W. Abueidda et al. International Journal of Plasticity 136 (2021) 102852
Nomenclature
3D Three-dimensional
%C Carbon content
σ ij Component of the average stress tensor
σ vM von Mises effective stress
ϵij Component of the average strain tensor
ϵ̇ Total strain rate tensor
ϵ̇el Elastic strain rate tensor
ϵ̇ie Inelastic strain rate tensor
ϵ̇th Thermal strain rate tensor
b Body force density vector
Ft Temporal loading conditions
G Geometry descriptor
ot Temporal output
P Material properties
Y Ground-truth output
▵t Time increment
γ Rate of change in kinematic modulus
̂
Y Output predicted from deep learning model
ν Poisson’s ratio
ρ Density
σo Initial yield stress
Af Area fraction
b Rate of change in yield surface size
BCC Body-centered cubic
C Initial kinematic hardening modulus
CFD Computational fluid dynamics
d Dilation factor
DL Deep learning
E Young’s modulus
f Sequence learning model
FCC Face-centered cubic
FE Finite element
FEA Finite element analysis
GPU Graphical processing unit
GRU Gated recurrent unit
H Specific enthalpy
HPC High-performance computing
k Thermal conductivity
L Length of outer square (case 1)
l Length of inner square (case 1)
LSTM Long short-term memory
MAE Mean absolute error
N Number of time steps
ncp Control point
ND Number of datapoints in the training dataset
PDEs Partial differential equations
Q Activation energy constant
q Flux
Q∞ Maximum change in yield surface size
ReLU Rectified linear unit
RUC Representative unit cell
SMAE Scaled mean absolute error
T Temperature
t Time
TCN Temporal convolutional network
u Displacement
2
2020; Teichert et al., 2019; Settgast et al., 2020; Abueidda et al., 2019a; Ali et al., 2019; Teichert et al., 2019; Al-Haik et al., 2006;
White et al., 2019; Haj-Ali et al., 2008; Yang et al., 2020). Mangal and Holm (2018, 2019) showed that machine learning can be used to
predict grains that become stress hotspots under deformation. Neural network models have been successfully utilized for predicting the
effective thermal conductivities of composites (Rong et al., 2019), the effective response of concrete (Unger and Könke, 2009),
dislocation microstructures (Zhang and Ngan, 2019), elastic response of carbon nanotubes (Papadopoulos et al., 2018), temperature-
and rate-dependent plasticity (Li et al., 2019), fatigue (Spear et al., 2018), and other problems. Moreover, deep learning models have
been employed to identify optimized topologies (e.g. Hamdia et al., 2019; Ulu et al., 2016; Abueidda et al., 2020b; Chen and Gu, 2020;
Kollmann et al., 2020).
Recognizing the relationship between property and structure for materials is a fundamental problem in solid mechanics and
material science, as it has vital role in creating next-generation materials (Al-Ketan et al., 2018; Al-Ketan et al., 2019; Al-Ketan and Abu
Al-Rub, 2019; Liu et al., 2019; Abueidda et al., 2019b; Abueidda et al., 2018; Abueidda et al., 2020a; Pham et al., 2019; Lee et al., 2017;
Brothers and Dunand, 2005; Abu Al-Rub et al., 2015; Gu et al., 2017). A key phenomenon required for designing novel materials is
plasticity. With the recent advances in 3D printing, one is capable of producing materials with unique geometries, where their me
chanical response exhibits plastic deformation. Such architectures (microstructures) require high computational cost due to the
intricate geometric designs. Predicting such responses can be remarkably challenging. Hashash et al. (2004), Jung and Ghaboussi
(2006), and Settgast et al. (2019) programmed user-defined material subroutines (UMATs) in which neural networks were trained and
used to replace the constitutive model integration in typical implicit nonlinear finite element solution procedures. Furthermore, the
phase transformation is a multiphysics problem, involving highly nonlinear constitutive modeling, while accounting for the coupled
thermo-viscoplastic behavior is crucial for obtaining a realistic response computationally (Koric and Thomas, 2006; Celentano, 2001;
Koric and Thomas, 2008; Maletta and Furgiuele, 2010; Sgambitterra et al., 2015).
Even though deep learning techniques can construct models for very intricate input-output relationships, their applications to
nonlinear materials are still limited. Problems with material nonlinearity and arbitrary loading paths, such as in plasticity, visco
plasticity, and damage (Patriarca et al., 2020), are challenging to model by standard forward deep neural networks due to their
inability to handle sequential information. Also, while deep learning offers a platform that is capable of notably rapid inference, it
needs a “big” training dataset to “learn” the multifaceted relations between the outputs and inputs. The required size of the dataset
used for training is problem-based, where the more complex the problem is, the more data is required for accurate predictions.
Therefore, modeling of plastic deformation of designed materials and viscoplastic response of solidification processes using deep
learning needs to be approached from a different perspective.
Mozaffar et al. (2019) recently proposed a recurrent neural network (RNN) model to obtain the homogenized stresses and plastic
energy for two types of composites, given a deformation path (homogenized strain tensor). They showed that such a machine learning
model could precisely predict plastic response. RNNs are sequence learning models (Goodfellow et al., 2016) that can be beneficial for
many mechanics-related problems. Although RNNs are the most famous sequence learning models so far, recent studies have shown
that convolutional architectures convincingly outperform the recurrent architectures such as long short-term memory (LSTM)
(Hochreiter and Schmidhuber, 1997) and gated recurrent unit (GRU) (Chung et al., 2014). Bai et al. (2018) proposed a generic
temporal convolutional network (TCN) architecture for sequence learning tasks. They showed that TCNs exhibit substantially longer
memory and are more accurate than recurrent architectures such as GRU and LSTM, yet TCNs are simpler.
In our current study, the entire history-dependent response in plasticity and thermo-viscoplasticity is accurately predicted by the
trained sequence deep learning methods from the input data for a particular domain. It is not a part of any classical incremental
nonlinear FEA solution procedure, such as in UMATs based on neural networks, where the constitutive model integration is replaced at
every equilibrium iteration by a feedforward neural network. We develop sequence learning models to predict the entire path-
dependent responses for two cases. In the first case, we consider a unit cell of an elastoplastic cellular material with a varying
porosity, subject to time-dependent strain tensor under periodic boundary conditions. Given a deformation path, we train the sequence
learning models to predict the responses. In other words, sequence learning models are learning the plasticity-constitutive relations
such that it predicts the histories of homogenized stress tensor and plastic energy, beyond the elastic limit, from the loading conditions
and microstructure descriptors. We refer to this problem as case 1 in the context of this paper. To do so, we consider a variety of
sequence learning models. We include a comparison between the performance of the recurrent architectures (LSTM and GRU) and TCN
architecture. Such a comparison can be a baseline for mechanics-related problems requiring sequence learning.
In the second case (called case 2), we develop sequence learning models based on the LSTM, GRU, and TCN architectures to predict
stress and temperature histories at different locations in a solidifying steel slab, given a set of thermal and mechanical boundary
conditions. Over 96% of the world’s steel is produced by continuous casting, and plant experiments are limited due to the severe
temperature conditions inside the liquid and solidifying steel, and the many process variables influencing this vital industrial process.
Thus, numerical modeling offers a perfect tool to understand these complex solidification phenomena in a caster better, to help to
reduce steel defects and carbon dioxide and other pollution emissions, and to increase the productivity of the process. While full
multiphysics simulations with the real caster geometries have been performed lately by Zappulla et al. (2020) as well as multiscale
simulations (Moj et al., 2017), they required both sophisticated software and high-performance computing (HPC). With the data
generated from modeling this process, the performance of the different sequence learning algorithms are tested and compared in terms
of accuracy and training time. This case is notably more challenging due to the multiphysics (coupled thermo-mechanical response
with phase transformations) as well as complex viscoplastic constitutive models with the combined path, time, and temperature
dependencies.
3
2. Methods
2.1. Problem definition
Generally, obtaining a model describing the outputs (temporal outputs in this case) of a problem requires information about the
corresponding input space. For the input space, there are three types of variables: 1) material properties of each constituent, 2) loading
conditions, and 3) geometric features. Here, the sequence learning models play the role of the finite element (FE) models, after having
the models trained, in terms of mapping the input space to the corresponding output space:
ot = f (Ft , G, P, t) (1)
where ot is the temporal output, f is the sequence learning model (mapping function), Ft is the temporal loading conditions, G is the
geometry descriptors, and P is the material properties of the constituents. In the present paper, we refer to a set of temporal and
nontemporal inputs (Ft, G, and P) and combined with its corresponding set of temporal outputs (ot) as “datapoint.”
2.2. Finite element analysis (FEA)
The sampling of the input space of case 1 and case 2 is discussed in Sections 3.2 and 4.2, respectively, as each problem has its own
considerations. The sampled input database is analyzed using implicit static FEA (ABAQUS, 2014) to create the output database. The
process of creating input and output databases is entirely automated. The input database is created using an in-house Python code that
performs the sampling, and it interacts with ABAQUS to do the required preprocessing, solve the finite element problem, complete the
required postprocessing, and store the results in the output database. The preprocessing step includes creating the geometry and
meshing it and assigning material properties and boundary conditions.
2.3. Datasets and loss function
After the creation of the input and output databases, the data created for each case are divided into training (81%), validation (9%),
and testing (10%) datasets. The training dataset is used to solve the optimization problem (minimizing the loss function) and find the
weights and biases of the sequence learning models (see Appendix A for further details). The validation dataset is the dataset not used
to find the optimized parameters of the different sequence learning models, but it is utilized to evaluate the convergence of the
different models as the training process progresses. In other words, after each epoch, the losses obtained from the validation and
training datasets are compared to identify issues with the training process, such as high bias or high variance (overfitting/under
fitting). Lastly, the testing dataset is used as a sanity check, to see whether the trained model generalizes to datapoints previously
unseen. The training process is probabilistic, and for better evaluation of the developed models, each model is trained multiple times.
In each run, the complete data are shuffled and randomly split into three datasets, training, validation, and testing datasets. More
details about the developed models and their training are discussed in the following sections.
The loss function we use is based on the mean absolute error (MAE) (Willmott and Matsuura, 2005), which is defined as:
∑ND ⃒⃒ ⃒
⃒̂
i=1 ⃒Y i − Y i ⃒
MAE = (2)
ND
where Y and Y ̂ denote the ground-truth and predicted outputs, respectively, and ND is the total number of datapoints in the training
dataset. The loss function we minimize is the scaled mean absolute error (SMAE), where outputs are scaled, so they vary between 0 and
10. More specifically, we use a scaling factor for each output type. Scaling the outputs prevents the optimizer from focusing on types of
outputs with relatively higher values (e.g., energies versus stresses for case 1 and stresses versus temperatures for case 2). The loss
function is minimized using the Adam optimizer (Kingma and Ba, 2014). The sequence learning models are implemented using Keras
(Chollet et al., 2015) with a TensorFlow (Abadi et al., 2015) backend.
2.4. Sequence learning models
Sequence learning models based on GRU and LSTM architectures have been used for solid mechanics-related problems, as discussed
in the work of Frankel et al. (2019) and Mozaffar et al. (2019). In this paper, besides the LSTM- and GRU-based models, sequence
models based on the TCN architecture are utilized. A brief introduction about deep learning in general and sequence learning, in
particular, can be found in Appendix A. Each one of the LSTM-, GRU-, and TCN-models consists of three stacked layers of either TCN,
GRU, or LSTM architectures, where a rectified linear unit (ReLU) activation layer is added after each layer to introduce nonlinearities
into the models (Xu et al., 2015). We have tested having 1, 3, and 5 stacked layers, and we found that 3 stacked layers balances between
computational cost and minimization of errors. After the three stacked layers, a single time-distributed dense layer is added. The three
models have a comparable size (number of trainable parameters).
The GRU-based model is composed of three stacked layers of 490 GRU units (leading to a model with 3.5 million trainable pa
rameters), while the LSTM-based model consists of three stacked layers of 420 LSTM units (leading to a model with 3.6 million
trainable parameters). The TCN-based model is convolutional-based, where each layer consists of one residual block. The numbers of
4
Fig. 1. Flowchart for the first illustrative example. Sequence learning models try to extract the average stresses and plastic energy, given an un
L2 − l2
deformed cellular material (defined by a nontemporal geometric constant Af = L2
) and a deformation path (defined by applied average
strain components).
filters in the three layers are 64, 128, and 256, while the kernel sizes are 7, 4, and 2. The combination of the large and small size of
kernels allows the model to capture global (from large kernels) and local (from small kernels) features. Another parameter that has to
be defined for each layer is the dilation rates used for the residual block. The dilation rates used for the first layer are 1, 2, 4, 8, and 16,
while the second layer has dilation rates of 1, 2, 4, 8, 16, and 32. The last TCN layer has dilation rates of 1, 2, 4, 8, 16, 32, and 64. With
the above-defined parameters, the size of the model is 3.4 million trainable parameters. The hyper-parameters are selected, such that
overfitting and underfitting are avoided and maintaining a balance between computational cost and error minimization.
In some cases, one has to vary nontemporal inputs (G and P), while such inputs affect the temporal outputs (ot). For instance, in case
1, the area fraction of the cellular material is varied, as discussed in Section 3.1. We found that it is efficient to deal with such types of
inputs as a sequence (i.e., in the same way, we deal with the temporal inputs (Ft)). However, the sequence has a constant value (e.g.,
area fraction in case 1) over the loading interval.
3. Periodic elastoplastic cellular material (case 1)
3.1. Problem overview
Metallic cellular materials are of high scientific and industrial interest (Romano et al., 2020), as they lead to enhanced properties
that are not attainable by monolithic materials (e.g. Al-Ketan et al., 2018; Harris and McShane, 2020). In the first illustrative example,
we utilize sequence learning models to obtain path-dependent responses for a class of periodic elastoplastic cellular materials (see
Fig. 1). This geometry is chosen as an example, and in our future work, we will discuss generalizing the model for arbitrary geometry.
The cellular material is made of steel, obeying von Mises plasticity with combined isotropic and kinematic hardening (Lemaitre and
Chaboche, 1990).
The equilibrium equation, in the absence of body and inertial forces, is expressed as:
∇·σ = 0 (3)
where σ is the Cauchy stress tensor, ∇⋅ denotes the divergence operator, and ∇ is the gradient operator. The total strain rate ϵ̇ (for small
deformation) is decomposed into:
ϵ̇ = ϵ̇el + ϵ̇pl (4)
5
Table 1
Material properties used for case 1.
Young’s modulus, E 200, 000 MPa
Poisson’s ratio, ν 0.3 [− ]
Initial yield stress, σo 200 MPa
Maximum change in yield surface size, Q∞ 2, 000 MPa
Rate of change in yield surface size, b 0.25 [− ]
Initial kinematic hardening modulus, C 22, 220 MPa
Rate of change in kinematic modulus, γ 34.65 [− ]
where ϵ̇el and ϵ̇pl are the elastic and plastic strain rates, respectively. The elastic response is modeled using:
σ=D el :ϵ (5)
where D el is the fourth-order elasticity tensor, which is defined by two constants: Young’s modulus E and Poisson’s ratio ν, assuming
an isotropic material. ϵ represents the second-order strain tensor. For the adopted pressure-independent plasticity model, the yield
surface is defined by the following function:
F = f (σ − α) − σo = 0
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
3 (6)
f (σ − α) = (S − αdev ) : (S − αdev )
2
where σo represents the yield stress, and f(σ − α) is the equivalent Mises stress with respect to the backstress α. αdev denotes the
deviatoric part of the backstress tensor, and S i the deviatoric stress tensor. Associated plastic flow is assumed:
∂F ˙
ϵ̇pl = ϵ pl
̂ (7a)
∂σ
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
2
ϵ˙ pl
̂ = ϵ̇pl : ϵ̇pl (7b)
3
σ o ̂ϵ˙ pl = σ : ϵ̇pl (7c)
where ̂ϵ˙ pl is the equivalent plastic strain rate defined in Equation (7b). The evolution of the equivalent plastic strain is attained using
Equation (7c).
The evolution law is composed of two components: 1) an isotropic hardening component representing the change of the equivalent
stress and determining the yield surface size σo, and 2) a nonlinear kinematic hardening component accounting for the translation of
the yield surface in stress space through the backstress α. The size of the yield surface σo is modeled with an exponential law for
materials experiencing cyclic hardening or softening:
( ( ))
σo = σ|0 + Q∞ 1 − exp − b̂ϵ pl (8)
where σ |0 denotes the size of the yield surface at zero plastic strain. The evolution of the backstress (kinematic hardening) is defined as:
1
α̇ = Ĉϵ˙ pl ϵ˙ pl .
(σ − α) − γα̂ (9)
σo
The material properties E, ν, σ o, Q∞, b, C, and γ are summarized in Table 1 along with their numerical values used for case 1.
We consider periodic cellular materials, where we obtain the responses of this cellular material under periodic boundary conditions
applied to unit cells with different area fractions. Having said that σ and ϵ are the stress ans strain tensors, respectively, we use the
overbar to denote a volume average of these tensors. In other words, σ and ϵ represent the volume average of the stress and strain
tensors, respectively, and they are defined as:
∫
1
σ = σ dV
V V
∫ (10)
1
ϵ = ϵdV
V V
where V is the volume of the unit cell. Periodic boundary conditions are applied, i.e.,
U(x + L) = U(x) + ϵ · L, t(x + L) = − t(x), ∀x ∈ ∂Bδ (11)
where u is the displacement, t denotes the traction, L is the periodicity length vector, and ∂Bδ represents the boundary of a unit cell. For
further details on the definition of volume average stresses and strains and application of periodic boundary conditions, the reader is
referred to the work of Jiang et al., 2002.The objective here is to develop a sequence learning model that takes nontemporal (geometry)
6
Fig. 2. Performance of the different models developed for case 1.
and temporal (components of a applied average strain tensor: ϵ11 (t), ϵ22 (t), and ϵ12 (t)) inputs and extracts temporal outputs (the
corresponding plastic energy and components of the average stress tensor: σ 11 (t), σ22 (t), and σ 12 (t)), as illustrated in Fig. 1. Plastic
energy here refers to the plastic dissipation experienced by the whole unit cell per unit volume. Since the geometry under consideration
is pretty simple, it can be described by one constant only, which is the area fraction of the solid Af.
3.2. Data description, generation, and processing
In case 1, we have temporal and nontemporal inputs, where the temporal inputs are the three components of the average strain
tensor, while the nontemporal input is the area fraction Af. Af is sampled from a uniform distribution, where the range of Af is
Af ∈ [0.35, 1.0]. Although Af is nontemporal, we deal with it as a sequence with a constant value over the loading interval, after getting
sampled, as discussed in Section 2.4. The sampling of temporal inputs is more intricate than nontemporal sampling, as one needs to
generate a sequence of points rather than a single point, as in the case of nontemporal inputs. Note that the deformation paths are
functions of pseudo-time, as the constitutive model is rate-independent.
In addition to the nontemporal input, each datapoint has three deformation paths, where each one represents a component of the
applied average strain tensor. A deformation path (strain component) is discretized into N = 100 steps of fixed size Δt = 1 s. We
generate random linear and nonlinear deformation paths. The deformation paths are generated using control points, where a control
point belongs to the strain versus time curve defining the strain history:
∙ For linear deformation paths, two control points are required to describe a deformation path. The average strain components at t =
100 s are drawn from a uniform distribution ([ − 5%, +5%]), while the strain components are set to zero at t = 0 s. Then, the two
control points are linearly connected (see Fig. B.6 for an example).
∙ For the nonlinear deformation paths, some control points ncp, equally spaced in time, are chosen and assigned random strain values.
Here, we use six control points, where the control points are defined at tcp = 0, 20, 40, 60, 80, and 100 s. The average strain values at
these control points are drawn from a uniform distribution, with a range of [ − 5%, +5%], except for tcp = 0 s, as the strains are set to
zero. After defining the control points, the average strain components are interpolated using a radial basis interpolation with a
Gaussian function (see the strain versus time curves in Fig. 1).
After having the deformation paths (the history of average stain components defined along with the area fraction Af, these inputs
are fed into ABAQUS environment and converted into a periodic boundary value problem to extract the outputs (see Fig. 1), as dis
cussed in Section 2.2. The outputs are the temporal average stress components and plastic energy. A total of 15,500 datapoints were
generated using high-throughput computing on three nodes of the NCSA’s high-performance computing cluster iForge. 20% of the
generated data have linear deformation paths, and the remaining 80% are with nonlinear deformation paths.
3.3. Results and discussion
After generating the data, three sequence learning models based on LSTM, GRU, and TCN units are trained on an iForge node with a
graphical processing unit (GPU) for 150 epochs. We use the testing datasets to evaluate the accuracy of the developed sequence
learning models. Note that a testing dataset is never seen by the deep learning models throughout the training process, and it is
different from the validation dataset (see Section 2.3 for more details).
Training is a stochastic process and can get trapped in local minima. Each model is trained ten times. Examples of the convergence
plots of the different sequence learning models are discussed in Appendix B.1. In each training trial, the complete dataset is shuffled
and split into three datasets, as discussed in Section 2.3. The SMAE’s of the testing datasets, corresponding to the ten training trials, are
7
Fig. 3. A comparison between the results obtained from the different sequence learning models (case 1).
8
Fig. 4. An example for predicting stresses and energy using the TCN-based model under uniaxial sinusoidal loading (case 1).
calculated, and we report their average in Fig. 2 along with the standard deviations (error bars). Fig. 2 portrays the performance of the
different models in terms of the average SMAE and average training time. The variation in the training time from one training trial to
another is minute; hence, it is not shown in Fig. 2 for clarity purpose. The SMAE means of the LSTM-based, GRU-based, and TCN-based
models are 0.0146, 0.0052, and 0.0061, respectively. Please note that the training of the LSTM-based model is highly unstable and
diverges frequently. The training instances, where divergence is encountered, are not used in the performance plot (see Fig. 2).
Divergence issues were not encountered when the GRU-based or TCN-based model is used. The average training times for the LSTM-
based, GRU-based, and TCN-based models are 3.5, 4.25, and 0.5 h(s), respectively. The SMAE of the GRU-based and TCN-based models
are close to each other, while the TCN-based model is 8.5 times faster to train. Fig. 3 illustrates a comparison between the results
obtained from the LSTM-based, GRU-based, and TCN-based models, given an input (Af and deformation paths) of a datapoint drawn
from the testing dataset.
Please note that the LSTM-based model occasionally captures the actual response obtained from the finite element analysis, as
shown in Fig. 3, and it fails most of the time, as discussed in Appendix B.1. However, the GRU-based and TCN-based models
consistently capture the response. Appendix B.1 includes additional examples for the results obtained from the different sequence
learning models, where different scenarios of deformation paths and area fractions have been considered. The results obtained from
the sequence learning models are compared with those obtained from the FEA (see Figures B.2-B.8).
Also, we check if the model can generalize in the sense that it predicts the outputs given deformation paths that are naturally
different from those we used for training. Fig. 4 shows the deformation paths and average stresses for the case of uniaxial loading. More
specifically, we apply a sinusoidal average strain (ϵ11 = 0.025 sin(0.25t)) while the other components vanish (ϵ22 = ϵ12 = 0). Note
that the training dataset does not include any deformation paths with sinusoidal deformation paths; it just includes linear deformation
paths and nonlinear deformation paths created by regression between control points. We compare the predicted responses obtained
from the sequence learning model with the ground-truth responses. The TCN-based model is still able to predict the stresses, including
the Poisson effect (σ 22 ∕
= 0), as shown in Fig. 4.
Such sequence learning models would be beneficial for the analysis of metallic and polymeric materials. They can also significantly
increase computational efficiency in the semi-concurrent nonlinear multiscale methods such as FE2 (Yuan and Fish, 2008), where
deformation gradient (strain measure) is passed from every integration point on a macro-scale to a representative unit cell (RUC) with
periodic boundary conditions, and where the stress measure can be obtained almost instantly from deep learning and sent back to the
macro-scale instead of numerically solving a materially nonlinear RUC boundary value problem. Eventually, such models would assist
in designing novel materials.
4. Thermo-viscoplastcity in steel solidification (case 2)
4.1. Problem overview
In the next test problem, we apply our deep learning model to a considerably more challenging thermo-viscoplasticity use case in a
multiphysics simulation of solidifying steel in the continuous casting process. Instead, we employ here a simplified geometry model but
having enough “multiphysics intelligence” from the deep learning training to instantly inference realistic local temperature and stress
distributions in the solidifying shell of low-carbon steel in a continuous caster on even low-end computing platforms such as laptops.
The thermal behavior of the solidifying shell is governed by the transient heat conduction given in Equation (12):
( )
∂H(T)
ρ = ∇ ·(k(T)∇T ) (12)
∂t
9
where T is the temperature, ρ denotes the temperature-dependent density, k is the isotropic temperature-dependent thermal con
ductivity, and H is the temperature-dependent specific enthalpy, including the latent heat during phase transformations, such as in
solidification and transition from δ-ferrite to austenite.
Inertial effects are negligible in solidification problems, so using the quasi-static mechanical equilibrium in Equation (13) as the
governing equation is appropriate (Koric and Thomas, 2006; Koric and Thomas, 2008; Zappulla et al., 2020; Huespe et al., 2000; Li and
Thomas, 2004):
∇·σ + b = 0 (13)
where σ is the Cauchy stress tensor, b and is the body force density vector. Rate-independent plastic constitutive models are generally
suitable for engineering purposes at room temperatures for most metallic materials. However, at elevated temperatures up to the
solidus temperature, most metals show significant time- and temperature-dependent inelastic behavior while phase transformations
add further modeling difficulties. The rate representation of total strain in this thermo-elastic-viscoplastic model is given by Equation
(14):
ϵ̇ = ϵ̇el + ϵ̇ie + ϵ̇th (14)
where ϵ̇, ϵ̇el , ϵ̇ie , and ϵ̇th are the total, elastic, inelastic, and thermal strain rate tensors, respectively. The stress and strain rates are linked
by the following constitutive equation:
( )
σ̇ = D : ϵ̇ − ϵ̇ie − αṪ (15)
where thermal strain rate ϵ̇th is the product of temperature rate Ṫ and the isotropic thermal expansion tensor α, D is the fourth-rank
elastic stiffness tensor defined as:
( )
2
D = 2μI + κB − μ I ⊗ I (16)
3
where μ and κB are the shear and bulk moduli, respectively, and are functions of temperature. I and I are the fourth- and second-order
identity tensors, respectively. ⊗ and: denote the outer and inner tensor products, respectively. In the case of finite deformation, the
objective stress rate is provided by the Jaumann stress rate ̂
σ:
σ = σ̇ + σ : W − W : σ
̂ (17)
where W is the spin tensor, i.e., the anti-symmetric part of the velocity gradient.
Previous works have used three main approaches to model viscoplasticity (Chen et al., 2017; Page and Hartmann, 2018). The first
approach is to develop constitutive models with evolving yield surface, see Chaboche (1989), Johnson and Cook (1983), Mayama and
Sasaki (2006), and Zerilli and Armstrong (1987). Most of these constitutive models resemble traditional plasticity theory, consisting of
yield criteria, plastic flow rules, and hardening rules. Behavior during loading and unloading may differ according to the yield surface,
and the flow rule defines the evolution of inelastic strain. Hardening rules vary from isotropic, where the yield surface expands with
increasing stress, to kinematic, where the yield surface remains the same shape and size but translates in stress space, with combi
nations to model more complex behavior.
The second approach is to develop constitutive models without an explicit definition of a yield surface. These can be further divided
into two types: Uncoupled constitutive models (that consider creep separately) such as Garofalo et al. (1963) and Hoff (1954), and
coupled (unified) constitutive models, which treat rate-independent plastic strain and creep as a combined inelastic strain, such as
Brown et al. (1989), Kozlowski et al. (1992), and Safari et al. (2013). Temperature dependency is often incorporated into these models
by an Arrhenius relationship, to make them thermo-viscoplastic. The third approach is based on a priori decomposition of the total
stress into rate-independent equilibrium stress and a rate-dependent overstress, also called overstress-type viscoplasticity, see Haupt
and Sedlan (2001) or Hartmann (2006). Numerous variations of viscoplastic constitutive laws have been built from these three major
types; interested readers should consult previous reviews such as Liang and Khan (1999) and Chaboche (2008). Coupled viscoplastic
constitutive models, which consider rate-dependent creep and rate-independent plasticity together as inelastic strain, are particularly
prevalent in steel casting.
Based on experimental data from creep and tensile tests Kozlowski et al. (1992), introduced a unified constitutive equation, shown
in Equation (18), below the solidus temperature to model the austenite phase of solidifying steel:
( )f3 ( )
ϵ̇ie = fc σ vM − f1 ϵie ϵie f2 − 1 exp − QT
(18)
where
Q=44,465
f1 =130.5− 5.128×10− 3 T
f2 =− 0.6289+1.114×10− 3 T
f3 =8.132− 1.54×10− 3 T
fc =46,550+71,400(%C)+12,000(%C)2 ,
( ) ( )
where Q (in K) is the activation constant defined as the activation energy in J
mol over gas constant in J
molK . The empirical
10
Fig. 5. Upper part of a slab caster with the slice domain.
Fig. 6. Thermal and mechanical boundary conditions.
functions f1, f2, f3, and fc depend on absolute temperature T (in K), and %C is carbon content (weight percent) representing steel grade
(composition). This unified constitutive equation relates inelastic strain rate ϵ̇ie to the difference between von Mises effective stress σvM
f − 1
and an inelastic strain term f1 εie εie2 , which is a form of backstress hardening parameter used in the work of Senior (1990). This term is
highly temperature-dependent, and Li and Thomas (2004) have shown that it permits an algorithmic estimate of kinematic hardening,
where σvM and ϵie are assigned according to the largest principal stress and strain rate components, respectively. ϵ̇ie and σvM are in s− 1
and in MPa, respectively. The initial value of the inelastic strain tensor at the solidus temperature is equal to the plastic strain tensor
from the elastic-perfectly-plastic constitutive model with small yield stress, which is used in this work to model volatile mushy and
liquid zones at temperatures higher than the solidus temperature.
Zhu (1996) devised a separate power-law based constitutive model to incorporate the much higher creep rates of a weaker
body-centered cubic (BCC) δ-ferrite phase of steel phase, compared to the stronger, face-centered cubic (FCC) austenite phase.
Equation (19) shows the power-law based constitutive model,
⃒ ⃒w
⃒ ⃒
⃒ ⃒
⃒ ⃒
⃒ σ vM ⃒
ϵ̇ie = 0.1⃒ ( )− 5.52 ( )m ⃒⃒
⃒
⃒fδc (%C) 300 ⃒
(19)
T 1+1000ϵie
⃒ ⃒
where
− 2
fδc (%C)=1.3678×104 (%C)− 5.56×10
m=− 9.4156×10− 5 T+0.3495
w= 1 .
1.617×10− 4 T− 0.06166
Based on the phase fraction calculations (austenite or ferrite), both of these constitutive models have been used together in many
steel solidification papers, such as by Huespe et al. (2000), Li and Thomas (2004), Koric and Thomas (2006), Wang et al. (2007), Safari
et al. (2013), and Qin et al. (2018). Due to the highly nonlinear inelastic responses, an implicit method is applied to integrate the
Kozlowski-Zhu constitutive models shown in Equations (18) and (19) as well as the rate representation of stress-strain constitutive
relation shown in Equation (15). However, with the help of Lush et al. (1989), the algebraic tensor equations shown in Equation (15)
are reduced into a scalar equation for isotropic materials. The remaining system of two nonlinear equations is solved by a special
bounded Newton–Raphson method devised by Koric and Thomas (2006). The further details of this computationally efficient local
solution scheme implemented in ABAQUS via a user-defined subroutine UMAT, including the Jacobian consistent with this method as
well as the treatment of liquid and mushy zones with an isotropic elastic-perfectly-plastic rate-independent constitutive model with a
small yield stress, can be found in the work of Koric and Thomas (2006). This integration method provided an order of magnitude
improvement with respect to the built-in integration methods used in ABAQUS, in terms of computational efficiency (required time) in
solving thermo-mechanical solidification problems. This method has been used in this work to generate viscoplastic training and
testing data. It is worth mentioning that the enhanced latent heat method (Koric et al., 2010b) was added later to account for
11
Fig. 7. Example of flux and displacement profiles.
Fig. 8. Performance of the different models developed for case 2.
non-uniform superheat fluxes from the turbulent computational fluid dynamics (CFD) calculations in the liquid steel phase, making a
realistic multiphysics modeling of metal solidification processes possible (Koric et al., 2010a).
For long objects under thermal loading, generalized (constant) plane strain condition can recover a 3D stress-strain state reasonably
well (Koric and Thomas, 2006; Li and Thomas, 2004). Due to the large length and width of the casting, it is reasonable to apply the
generalized plane strain condition in both the y- and z-directions in a slice domain through a solidifying shell (see Fig. 5), traveling
down the mold for 17 s in the Lagrangian frame of reference. While generalized plane strain elements provide this condition in the
casting (z-direction), lower edge nodes are forced to vertically displace equally in order to provide generalized plane strain condition in
the y-direction (see Fig. 6). For more details about the thermo-mechanical model for steel solidification applied here as well as the
temperature-dependent nonlinear material properties, one can consult the works of Koric and Thomas (2006) and Koric et al. (2010a).
4.2. Data description, generation, and processing
A local temporal profile of thermal flux (Li and Thomas, 2004) leaving the chilled steel surface as well as its movement due to a
contact interaction with the mold wall taper and mold thermal deformation (Zappulla et al., 2020) can be measured at a plant or
calculated. Thus, the thermal natural and mechanical essential boundary conditions for the solidifying slice domain are obtained.
Based on experimental observations, the flux has an overall decaying profile, while the displacement profile has an increasing trend.
The flux and displacement profiles evolve to the end state in N = 100 steps of fixed size Δt = 0.17 s (the total time is 17 s). For each
datapoint, six control points ncp are chosen. The first control point is at t = 0 s, the last control point is at t = 17 s, while the physical
times of the four remaining control points tcp are sampled from a uniform distribution (their range is tcp ∈ (0, 17) s). Then, the flux qcp
(in MW
m2
) and displacement ucp (in mm) values at the control points tcp are obtained using Equation (20):
( )− r
qcp = B tcp + 1
(20)
ucp = D tcp
where B, r, and D are drawn from uniform distributions (B ∈ [4, 7], r ∈ [0.3, 0.7], and D ∈ [0.003, 0.012]). Equation (20) and the ranges
of B, r, and D are chosen to recreate measured and calculated values of the chilled surface heat flux and displacement profiles (Li and
Thomas, 2003). After having the control points defined, a radial basis interpolation, with Gaussian function, is used to connect
(interpolate) the points to emulate additional fluctuations and nonlinearities observed in the actual flux and displacement profiles due
12
Fig. 9. A comparison between the results obtained from the different sequence learning models (case 2).
13
to sudden changes in contact conditions and interfacial heat transfer between mold and steel (Zappulla et al., 2020; Li and Thomas,
2004). Fig. 7 shows an example of flux and displacement profiles.
Then, the inputs are fed into the finite element computational environment linked with the user-defined solidification subroutines
(Koric and Thomas, 2006; Zappulla et al., 2020), and a coupled boundary value thermo-mechanical problem in Equations (12) and
(13) is solved numerically to extract the temperature and stress outputs. The outputs are the stresses and temperatures at different
locations on the slab. Since it is known that quality of casting mostly depends on the solidification conditions on the chilled surface and
its subsurface (Li and Thomas, 2004) (see Fig. 6), the temperature and lateral stress histories at the four points along the bottom edge,
at the chilled surface (node 1) and 1 mm (node 21), 2 mm (node 41), and 3 mm (node 61) in the domain interior, are stored. The
stresses are stored at the element integration points closest to the temperature nodes (from left), where the element numbers are 1, 10,
20, and 30. A total of 14,500 datapoints were generated using high-throughput computing on three nodes of the NCSA’s
high-performance computing cluster iForge.
4.3. Results and discussion
After the generation of the input and output databases, three models based on LSTM, GRU, and TCN architectures are trained on an
iForge node with a GPU. The models are trained ten times, each for 150 epochs. Examples of the convergence plots of the different
sequence learning models are discussed in Appendix B.2. Once the ten training trials are completed, the corresponding testing datasets
are used to evaluate the predictions of the three deep learning models. Fig. 8 depicts the performance of the different models in terms of
the average SMAE of the testing datasets and training times. The SMAE means of the LSTM-based, GRU-based, and TCN-based models
are 0.0187, 0.00877, and 0.00877, while the average training times on GPUs are 3.0, 4.0, and 0.5 h, respectively. The LSTM-based
model has the highest error values among the models, while the GRU-based and TCN-based have similar error values. However,
the TCN-based model training is eight times more computationally efficient than the training of the GRU-based model. Similar to case
1, the training instances of the LSTM-based model, where divergence is encountered, are not used in the performance plot.
Fig. 9 shows, for a particular test sample, the histories of temperatures and lateral stresses for the four output points, both ground-
truth (actual calculated FEA results) and predicted by the three sequence learning models. Note that we have checked the results from
many other test samples, and every time the highly complex and coupled thermo-mechanical viscoplastic phenomena of steel solid
ification, which generates both tensile and compressive residual stresses on the chilled surface and its subsurface, was almost ideally
recovered by the GRU and TCN deep learning models. In contrast, the LSTM model only approximately predicted the temperatures and
stresses mainly due to its limited capabilities to handle long sequences of data such as here with 100 time steps. Appendix B.2 includes
additional examples, with various complex mechanical and thermal loading scenarios, of the results obtained from the different
sequence learning models. The results are compared with those obtained from the FEA (see Figs. B.10-B.15).
Furthermore, for even a longer range of information propagation, Bai et al. (2018) noted that the “infinite memory” advantage of
the most sophisticated canonical RNN architectures such as GRU will be likely absent and that the convolutional neural network
architectures with temporal capabilities such as TCN are more computationally efficient networks to train and handle diverse sequence
modeling tasks in the confluence of artificial intelligence and the modeling of materials and processes with complex constitutive
behavior such as in thermo-viscoplasticity. Finally, once the neural network model is appropriately trained for a specific class of
viscoplastic solidification problems, its training parameters (weights and biases) can be transferred to even low-end platform such as
laptops for inference, i.e., to provide almost instant thermo-mechanical results which are in very good agreement with the numerical
modeling results obtained from the complex multiphysics viscoplastic framework, which requires expensive computation.
5. Conclusions
We have applied and compared long short-term memory (LSTM) and gated recurrent unit (GRU) recurrent neural networks and
temporal convolutional network (TCN) to a periodic elastoplastic material as well as to a more complex thermo-viscoplastic steel
solidification model. We found that although the TCN-based and GRU-based models could precisely learn and reproduce the extremely
complex behavior of these plastic materials, the TCN-based model is more computationally efficient. Moreover, the developed deep
learning model is robust, and it is in excellent agreement with the numerical results obtained from the conventional numerical methods
in plasticity. Once appropriately trained on a high-end computing system with sufficient sequenced data generated from classical
numerical plasticity simulations, the deep learning models developed in this work can almost instantly produce good quality results for
unseen input data on a low-end computing platform such as laptops. The results of this work, coupled with the recent finite element
method advances on HPC (Koric and Gupta, 2016; Vázquez et al., 2016; Borrell et al., 2018), may lead to broader adoption of
data-driven deep learning models in similarly computationally challenging material plasticity and inelasticity constitutive models with
combined rate, temperature, and loading path dependencies. In our future work, we plan to apply these deep learning models to other
challenging history-dependent plasticity phenomena such as ratcheting under multiple-step cyclic loading (Jiang and Sehitoglu, 1994;
Jiang and Sehitoglu, 1996).
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
14
CRediT authorship contribution statement
Diab W. Abueidda: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visu
alization, Writing - original draft, Writing - review & editing. Seid Koric: Conceptualization, Data curation, Formal analysis, Funding
acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing -
original draft, Writing - review & editing. Nahil A. Sobh: Conceptualization, Formal analysis, Methodology, Software, Writing -
original draft, Writing - review & editing. Huseyin Sehitoglu: Conceptualization, Formal analysis, Methodology, Supervision, Writing
- original draft, Writing - review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgment
The authors would like to thank the National Center for Supercomputing Applications (NCSA) Industry Program and the Center for
Artificial Intelligence Innovation for software and hardware support. The authors thank the Continuous Casting Center at the Colorado
School of Mines for the support. The authors thank Dr. Aiman Soliman for the helpful discussions.
Appendix A. Introduction to neural networks and sequence learning
Appendix A.1. Feedforward dense artificial neural networks
Deep learning is a special type of machine learning that works with different types of artificial neural networks (ANN). Inspired by
the structure and functionality of a brain, artificial neural networks are layers of interlinked individual units, called neurons, where
each performs a specific function given input data. As illustrated in Fig. A.1, the most straightforward neural network, so-called dense
feedforward neural network, is composed of linked layers of neurons that map the input data to predictions. The deepness of an
artificial neural network is driven by the number of hidden layers, i.e., the layers in between input and output layers.
Fig. A.1. Dense artificial neural network.

Upon initialization, the weights W and biases B of the model will be far from ideal. Throughout the training process, as sche
matically depicted in Fig. A.2, the data flow from the model goes forward from input to output in the neural network. The output ô for a
layer n is calculated as:
n− 1
Zn = W n ̂
o + Bn
n (A.1)
o = f (Zn )
̂ n
where W and B are updated after each training pass, and fn is the activation function that transforms Z into an output ô for every
neuron. Neural networks use easy differentiable nonlinear activation functions, which help the network learn complex functional
relationships between inputs and outputs, and thus provide accurate predictions. Some examples of activation functions commonly
used in neural networks are rectified linear unit (ReLU), sigmoid, hyperbolic tangent, etc.
15
Fig. A.2. Generic training process in ANN.

The loss function ℒ then compares those predictions to the targets, thus producing a loss value that is a measure of how well the
network’s predictions match what was expected. Then in a process called backpropagation, the optimizer minimizes loss value iter
atively with gradient descent, as shown in Equation (A.2):
∂ℒ
W k+1
ij = W kij − β
∂W kij
(A.2)
∂ℒ
Bk+1
i = Bki − β k
∂Bi
and similar optimization techniques. The gradients of a loss function (ℒ) are calculated with respect to the weights and biases in the last
layer, and the weights and biases are adjusted for each node, before the same process is done for the previous layer and until all of the
layers have had their weights adjusted. Then a new iteration k with forward propagation starts again. The weights Wk and biases Bk
will converge toward a minimized loss value, after a reasonable number of iterations. The parameter β is called the learning rate, and it
is a vital parameter in the neural network learning process.
Appendix A.2. Recurrent neural networks
A recurrent neural network (RNN) is a type of artificial neural network commonly used for training sequences of events that follow
one another in time. For a given input sequence x0, …xT, the matching output ŷ0, …ŷT at each time step will be predicted. For an output
yt at some time t, we are limited to use xt and inputs that have been previously observed x0, …xt. Therefore, a sequence learning
network is a function performing the following mapping F:
̂ T+1 with mapping ̂

F : xT+1 → Y y 0 , … ̂y T = F(x0 , … xt ). (A.3)
The goal of sequence learning is to minimize the loss function
y0, … ̂
loss = loss(̂ y T , y0 , … yT ), (A.4)
which defines the deviation of the prediction Ŷ from the actual target Y. Unlike feedforward neural networks, RNNs contain recurrent
computing cycles and use an internal state memory h to process sequences of inputs. A basic RNN is described by the propagation
equations shown in Equation A.5:
ht = f (Uxt + Wht− 1 + B)
(A.5)
̂
y t = Vht + c
where B and c denote the bias vectors, and U, V, and W are the weight matrices for input-to-hidden, hidden-to-output, and hidden-to-
hidden connections, respectively. f is the activation function. Fig. A.3 portrays the computational model along with its unfold version
in time.
16
Fig. A.3. Recurrent neural network.

Recurrent neural networks build very deep computational graphs by repeatedly (recurrently) applying the same operation at each
time step of a long time sequence. This may give rise to the vanishing or exploding gradient problems during backpropagation, making
it challenging to train long sequence data with traditional RNNs. The long short-term memory (LSTM) and gated recurrent unit (GRU)
are special recurrent neural network architectures devised to address the vanishing and exploding gradient problems. Memory (state)
cells in LSTM and GRU networks are designed to dynamically “forget” some old and unnecessary information via special gated units
that control the flow of information inside a cell and thus avoiding multiplication of large sequence of numbers during temporal
backpropagation. A comprehensive overview of LSTM and GRU networks can be found in the work of Pattanayak (2017).
Appendix A.3. Temporal convolutional network
Temporal convolutional networks (TCN) belong to a family of neural network architectures that incorporate one-dimensional
convolutional layers (Conv1D). The standard (two-dimensional) version of convolutional neural networks (CNNs) has customarily
been accepted as the best deep learning models available today for analyzing images and extracting knowledge from them. Convo
lution is the operation of running a smaller matrix (known as a kernel or filter) over a larger matrix representing an image. At each run,
we do an element-wise multiplication of the two matrix elements and add them. Then, the kernel two-dimensionally slides to another
portion of the image matrix until the whole image is convoluted, and a feature map corresponding to a particular kernel is computed.
The gradients of a loss function with respect to kernel weights are then calculated in a backpropagation process, and the weights are
updated similar to dense ANN. There are usually many kernels in CNN, as symbolically depicted in Fig. A.4 with three filters. Each can
extract spatial features. For example, CNN can detect edges, distribution of colors, etc. in the image, which makes these networks very
robust in image classification and similar data with spatial properties.
Fig. A.4. 2D convolutional neural network.

The TCN (see Fig. A.5) is designed from two basic principles: 1) the convolutions are causal (i.e., there is no leakage of information
from future to past), and 2) the architecture takes sequences of an arbitrary length and maps it to output sequences of the same length
(similar to RNNs). The first point is achieved by using causal convolutions. In Conv1D used in TCN, kernel slides along one dimension,
i.e., along time. In other words, outputs at time t are only convolved with elements from time t and earlier in the previous layer. The
second point is achieved by using a one-dimensional fully-convolutional network, where each hidden layer has the same length as the
input layer.
17
Fig. A.5. Illustration for the TCN architecture: (a) residual block, and (b) dilated causal convolution with dilation factors d = 1, 2, and 4 and filter
size 3.
In order to learn from previous time steps, multiple 1D convolutional layers are typically stacked on top of each other, as shown in
Fig. A.5. The dilation factor (or gaps in entries) of subsequent convolution layers is increased exponentially to achieve large receptive
field sizes. The receptive field size of such a network can be computed as the kernel size is multiplied by the dilatation factor of the last
hidden layer. The receptive field size can be adjusted to accommodate sequencing data by changing the kernel size and number of
convolutional layers. A couple of advantages of TCN over RNN are:
∙ Computational and memory efficiency: TCN does not have recurrent and data-dependent calculations, and therefore, it allocates
much less memory than RNN and can be easily parallelized on multi-core CPU or GPU computational architectures.
∙ Consistent gradients: since TCNs consist of convolutional layers, they backpropagate differently from RNNs. Thus, all of the gra
dients are preserved, and TCNs do not suffer from vanishing and exploding gradients.
More details about the TCN network architecture applied in anomaly detection and its advantages over RNN can be found in the
work of Bai et al. (2018) and Alla and Adari (2019).
Appendix B. . Convergence plots and additional examples
Appendix B.1. Case 1
The data generated are divided into training (81%), validation (9%), and testing (10%) datasets. The training dataset is used to
minimize the loss function and solve for the weights and biases of the sequence learning models. The validation dataset is utilized to
evaluate the convergence of the model as the training process progresses. In other words, the validation dataset is used to detect
problems such as high bias or high variance (overfitting/underfitting). The training process is stochastic due to the initialization of
weights and biases and the shuffling of datapoints. Each model is trained ten times; Fig. B.1 depicts an example of the convergence
plots of each sequence learning model for case 1. Although the LSTM-based learns well in a few training instances, the training of the
LSTM-based sequence learning model is generally highly unstable, as shown in Fig. B.1. Such an issue is not encountered with the GRU-
based and TCN-based models. In all training instances performed, both models provided accurate qualitative and quantitative results,
with slight variations in the SMAE converged value, where no significant overfitting or underfitting is observed.
Fig. B.1. Convergence plots of the different sequence learning models (case 1).
Here, we show various complex deformation paths and their corresponding plastic energies and components of the stress tensor.
Figs. B.2-B.8 show a few more test examples for case 1. The LSTM-based model has the highest error values and often fails drastically to
capture the temporal stresses, while it captures the qualitative trend of plastic energy, as shown in Fig. B.2. Figs. B.3-B.8 further
demonstrate how the GRU-based and TCN-based models, learning from training data only, manage to predict the history- and time-
18
dependent constitutive model accurately in each test example, where these examples have never been seen before by the neural
networks during the training process. The goal of adding these examples is to show that sequence learning models, such as GRU-based
and TCN-based models, can accurately predict the temporal outputs corresponding to a spectrum of temporal input profiles, including
random linear and nonlinear and sinusoidal deformation paths.
Fig. B.2. An example from the testing dataset for predicting stresses and energy using the LSTM-based model (case 1).
Fig. B.3. An example from the testing dataset for predicting stresses and energy using the GRU-based model (case 1).
19
Fig. B.4. An example from the testing dataset for predicting stresses and energy using the TCN-based model (case 1).
20
21
Appendix B.2. Case 2
Similar to case 1, the data are split into training (81%), validation (9%), and testing (10%) datasets. We train each model ten times
to evaluate the performance of the different sequence learning models, as the training process is influenced by the shuffling of the
datapoints and the initialization of weights and biases. Fig. B.9 portrays an example of the convergence plots of each sequence learning
model for case 2. Similar conclusions about the performance of the different models are observed. The LSTM-based sequence learning
model is highly unstable, as shown in Fig. B.9, and it failed in all training instances to provide meaningful results. However, the GRU-
based and TCN-based models do not experience any significant overfitting or underfitting. In all training instances performed, both
models accurately predicted the qualitative and quantitative responses, with slight variations in the converged value of the SMAE, as
shown in Fig. 8.
Fig. B.9. Convergence plots of the different sequence learning models (case 2).
Figs. B.10-B.15 show a few more test examples of case 2 with various temporal flux and displacement input profiles, ground-truth
results calculated from the thermo-mechanical model, and the corresponding sequence learning predictions from the neural networks.
It further demonstrates how the TCN-based and GRU-based models, learning from training data only, were able to correctly reproduce
the complex history- and time-dependent phenomenon in each test example, which were never seen before by the neural networks. For
instance, in Fig. B.12, a low value of thermal flux at around 10 s, followed by its increase in the next 5 s, resulted in a temporary rise in
slice temperatures (reheating) followed by a cooling period. This also produced stress reversals, from early time tension to compression
and back to tension at the later times, which were all correctly inferenced by the TCN model.
22
Fig. B.10. An example from the testing dataset for predicting temperatures and stresses using the LSTM-based model (case 2).
23
Fig. B.11. An example from the testing dataset for predicting temperatures and stresses using the GRU-based model (case 2).
24
Fig. B.12. An example from the testing dataset for predicting temperatures and stresses using the TCN-based model (case 2).
25
26
27
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G.,
Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B.,
Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X., 2015.
TensorFlow: large-scale machine learning on heterogeneous systems. URL: https://www.tensorflow.org/software available from tensorflow.org.
ABAQUS, V., 2014. 6.14 documentation. Dassault Systemes Simulia Corporation 651, 6–2.
Abu Al-Rub, R.K., Abueidda, D.W., Dalaq, A.S., 2015. Thermo-electro-mechanical properties of interpenetrating phase composites with periodic architectured
reinforcements. In: From Creep Damage Mechanics to Homogenization Methods. Springer, pp. 1–18.
Abueidda, D.W., Almasri, M., Ammourah, R., Ravaioli, U., Jasiuk, I.M., Sobh, N.A., 2019a. Prediction and optimization of mechanical properties of composites using
convolutional neural networks. Compos. Struct. 227, 111264.
Abueidda, D.W., Elhebeary, M., Shiang, C.-S.A., Abu Al-Rub, R.K., Jasiuk, I.M., 2020a. Compression and buckling of microarchitectured neovius-lattice. Extreme
Mechanics Letters 37, 100688.
Abueidda, D.W., Elhebeary, M., Shiang, C.-S.A., Pang, S., Abu Al-Rub, R.K., Jasiuk, I.M., 2019b. Mechanical properties of 3D printed polymeric gyroid cellular
structures: experimental and finite element study. Mater. Des. 165, 107597.
Abueidda, D.W., Karimi, P., Jin, J.-M., Sobh, N.A., Jasiuk, I.M., Ostoja-Starzewski, M., 2018. Shielding effectiveness and bandgaps of interpenetrating phase
composites based on the schwarz primitive surface. J. Appl. Phys. 124, 175102.
Abueidda, D.W., Koric, S., Sobh, N.A., 2020b. Topology optimization of 2D structures with nonlinearities using deep learning. Comput. Struct. 237, 106283.
Al-Haik, M., Hussaini, M., Garmestani, H., 2006. Prediction of nonlinear viscoelastic behavior of polymeric composites using an artificial neural network. Int. J. Plast.
22, 1367–1392.
Al-Ketan, O., Abu Al-Rub, R.K., 2019. Multifunctional mechanical metamaterials based on triply periodic minimal surface lattices. Adv. Eng. Mater. 21, 1900524.
Al-Ketan, O., Pelanconi, M., Ortona, A., Abu Al-Rub, R.K., 2019. Additive manufacturing of architected catalytic ceramic substrates based on triply periodic minimal
surfaces. J. Am. Ceram. Soc. 102, 6176–6193.
Al-Ketan, O., Rowshan, R., Abu Al-Rub, R.K., 2018. Topology-mechanical property relationship of 3D printed strut, skeletal, and sheet based periodic metallic cellular
materials. Additive Manufacturing 19, 167–183.
Ali, U., Muhammad, W., Brahme, A., Skiba, O., Inal, K., 2019. Application of artificial neural networks in micromechanics for polycrystalline metals. Int. J. Plast. 120,
205–219.
Alla, S., Adari, S.K., 2019. Beginning Anomaly Detection Using Python-Based Deep Learning. Springer.
Bai, S., Kolter, J.Z., Koltun, V., 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling arXiv preprint arXiv:
1803.01271.
Borrell, R., Cajas, J.C., Mira, D., Taha, A., Koric, S., Vázquez, M., Houzeaux, G., 2018. Parallel mesh partitioning based on space filling curves. Comput. Fluids 173,
264–272.
28
Brothers, A., Dunand, D.C., 2005. Plasticity and damage in cellular amorphous metals. Acta Mater. 53, 4427–4440.
Brown, S.B., Kim, K.H., Anand, L., 1989. An internal variable constitutive model for hot working of metals. Int. J. Plast. 5, 95–130.
Celentano, D.J., 2001. A large strain thermoviscoplastic formulation for the solidification of sg cast iron in a green sand mould. Int. J. Plast. 17, 1623–1658.
Chaboche, J.-L., 1989. Constitutive equations for cyclic plasticity and cyclic viscoplasticity. Int. J. Plast. 5, 247–302.
Chaboche, J.-L., 2008. A review of some plasticity and viscoplasticity constitutive theories. Int. J. Plast. 24, 1642–1693.
Chen, C.-T., Gu, G.X., 2020. Generative Deep Neural Networks for Inverse Materials Design Using Backpropagation and Active Learning. Advanced Science, 1902607.
Chen, G., Zhao, X., Wu, H., 2017. A critical review of constitutive models for solders in electronic packaging. Adv. Mech. Eng. 9, 1687814017714976.
Cheng, F., Kim, S.-M., Reddy, J., Abu Al-Rub, R.K., 2014. Modeling of elastoplastic behavior of stainless-steel/bronze interpenetrating phase composites with damage
evolution. Int. J. Plast. 61, 94–111.
Chollet, F., et al., 2015. Keras. https://github.com/fchollet/keras.
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep
Learning. December 2014.
Frankel, A.L., Jones, R.E., Alleman, C., Templeton, J.A., 2019. Predicting the mechanical response of oligocrystals with deep learning. Comput. Mater. Sci. 169,
109099.
Garofalo, F., et al., 1963. An empirical relation defining the stress dependence of minimum creep rate in metals. Trans Metall Soc AIME 227, 351–355.
Ge, Z., He, Y., Li, T., 2019. Stabilized multiphysics finite element method with crank–nicolson scheme for a poroelasticity model. Numer. Methods Part. Differ. Equ.
35, 1412–1428.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT press.
Gu, G.X., Takaffoli, M., Buehler, M.J., 2017. Hierarchically enhanced impact resistance of bioinspired composites. Adv. Mater. 29, 1700060.
Haj-Ali, R., Kim, H.-K., Koh, S.W., Saxena, A., Tummala, R., 2008. Nonlinear constitutive models from nanoindentation tests using artificial neural networks. Int. J.
Plast. 24, 371–396.
Hamdia, K.M., Ghasemi, H., Bazi, Y., AlHichri, H., Alajlan, N., Rabczuk, T., 2019. A novel deep learning based method for the computational material design of
flexoelectric nanostructures with topology optimization. Finite Elem. Anal. Des. 165, 21–30.
Harris, J., McShane, G., 2020. Metallic stacked origami cellular materials: additive manufacturing, properties, and modelling. Int. J. Solid Struct. 185, 448–466.
Hartmann, S., 2006. A thermomechanically consistent constitutive model for polyoxymethylene. Arch. Appl. Mech. 76, 349–366.
Hashash, Y., Jung, S., Ghaboussi, J., 2004. Numerical implementation of a neural network based material model in finite element analysis. Int. J. Numer. Methods
Eng. 59, 989–1005.
Haupt, P., Sedlan, K., 2001. Viscoplasticity of elastomeric materials: experimental facts and constitutive modelling. Arch. Appl. Mech. 71, 89–109.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9, 1735–1780.
Hoff, N., 1954. Approximate analysis of structures in the presence of moderately large creep deformations. Q. Appl. Math. 12, 49–55.
Huerta, A., Belytschko, T., Fernández-Méndez, S., Rabczuk, T., Zhuang, X., Arroyo, M., 2018. Meshfree Methods, second ed. Encyclopedia of Computational
Mechanics, pp. 1–38.
Huespe, A., Cardona, A., Nigro, N., Fachinotti, V., 2000. Visco-plastic constitutive models of steel at high temperature. J. Mater. Process. Technol. 102, 143–152.
Hughes, T.J., 2012. The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Courier Corporation.
Hughes, T.J., Sangalli, G., Tani, M., 2018. Isogeometric analysis: mathematical and implementational aspects, with applications. In: Splines and PDEs: from
Approximation Theory to Numerical Linear Algebra. Springer, pp. 237–315.
Jiang, M., Jasiuk, I., Ostoja-Starzewski, M., 2002. Apparent elastic and elastoplastic behavior of periodic composites. Int. J. Solid Struct. 39, 199–212.
Jiang, Y., Sehitoglu, H., 1994. Multiaxial cyclic ratchetting under multiple step loading. Int. J. Plast. 10, 849–870.
Jiang, Y., Sehitoglu, H., 1996. Modeling of cyclic ratchetting plasticity, part I: development of constitutive relations. J. Appl. Mech. 63, 720–725.
Johnson, G.R., Cook, W.H., 1983. A constitutive model and data for metals subjected to large strains, high strain rates and high temperatures. In: Proceedings of the
7th International Symposium on Ballistics, pp. 541–547. The Netherlands vol. 21.
Jung, S., Ghaboussi, J., 2006. Neural network constitutive model for rate-dependent materials. Comput. Struct. 84, 955–963.
Kingma, D.P., Ba, J., 2014. Adam: a method for stochastic optimization arXiv preprint arXiv:1412.6980.
Kollmann, H.T., Abueidda, D.W., Koric, S., Guleryuz, E., Sobh, N.A., 2020. Deep learning for topology optimization of 2D metamaterials. Mater. Des. 196, 109098.
Koric, S., Gupta, A., 2016. Sparse matrix factorization in the implicit finite element method on petascale architecture. Comput. Methods Appl. Mech. Eng. 302,
281–292.
Koric, S., Hibbeler, L.C., Liu, R., Thomas, B.G., 2010a. Multiphysics model of metal solidification on the continuum level. Numer. Heat Tran., Part B: Fundamentals 58,
371–392.
Koric, S., Thomas, B.G., 2006. Efficient thermo-mechanical model for solidification processes. Int. J. Numer. Methods Eng. 66, 1955–1989.
Koric, S., Thomas, B.G., 2008. Thermo-mechanical models of steel solidification based on two elastic visco-plastic constitutive laws. J. Mater. Process. Technol. 197,
408–418.
Koric, S., Thomas, B.G., Voller, V.R., 2010b. Enhanced latent heat method to incorporate superheat effects into fixed-grid multiphysics simulations. Numer. Heat
Tran., Part B: Fundamentals 57, 396–413.
Kozlowski, P.F., Thomas, B.G., Azzi, J.A., Wang, H., 1992. Simple constitutive equations for steel at high temperature. Metallurgical Transactions A 23, 903.
Lee, D.-W., Khan, K.A., Abu Al-Rub, R.K., 2017. Stiffness and yield strength of architectured foams based on the schwarz primitive triply periodic minimal surface. Int.
J. Plast. 95, 1–20.
Lemaitre, J., Chaboche, J.-L., 1990. Mechanics of Solid Materials. Cambridge University Press.
Li, C., Thomas, B.G., 2003. Ideal taper prediction for billet casting. In: ISSTECH-conference Proceedings. IRON AND STEEL SOCIETY, pp. 685–700.
Li, C., Thomas, B.G., 2004. Thermomechanical finite-element model of shell behavior in continuous casting of steel. Metall. Mater. Trans. B 35, 1151–1172.
Li, X., Roth, C.C., Mohr, D., 2019. Machine-learning based temperature-and rate-dependent plasticity model: application to analysis of fracture experiments on dp
steel. Int. J. Plast. 118, 320–344.
Liang, R., Khan, A.S., 1999. A critical review of experimental results and constitutive models for bcc and fcc metals over a wide range of strain rates and temperatures.
Int. J. Plast. 15, 963–980.
Liu, S., Azad, A.I., Burgueño, R., 2019. Architected materials for tailorable shear behavior with energy dissipation. Extreme Mechanics Letters 28, 1–7.
Lush, A., Weber, G., Anand, L., 1989. An implicit time-integration procedure for a set of internal variable constitutive equations for isotropic elasto-viscoplasticity. Int.
J. Plast. 5, 521–549.
Mahnken, R., Schneidt, A., Antretter, T., 2009. Macro modelling and homogenization for transformation induced plasticity of a low-alloy steel. Int. J. Plast. 25,
183–204.
Maletta, C., Furgiuele, F., 2010. Analytical modeling of stress-induced martensitic transformation in the crack tip region of nickel–titanium alloys. Acta Mater. 58,
92–101.
Mangal, A., Holm, E.A., 2018. Applied machine learning to predict stress hotspots i: face centered cubic materials. Int. J. Plast. 111, 122–134.
Mangal, A., Holm, E.A., 2019. Applied machine learning to predict stress hotspots ii: hexagonal close packed materials. Int. J. Plast. 114, 1–14.
Mayama, T., Sasaki, K., 2006. Investigation of subsequent viscoplastic deformation of austenitic stainless steel subjected to cyclic preloading. Int. J. Plast. 22,
374–390.
Michalski, R.S., Carbonell, J.G., Mitchell, T.M., 2013. Machine Learning: an Artificial Intelligence Approach. Springer Science & Business Media.
Miehe, C., Schröder, J., Schotte, J., 1999. Computational homogenization analysis in finite plasticity simulation of texture development in polycrystalline materials.
Comput. Methods Appl. Mech. Eng. 171, 387–418.
Moj, L., Foppe, M., Deike, R., Ricken, T., 2017. Micro-macro modelling of steel solidification: a continuum mechanical, bi-phasic, two-scale model including thermal
driven phase transition. GAMM-Mitteilungen 40, 125–137.
29
Mozaffar, M., Bostanabad, R., Chen, W., Ehmann, K., Cao, J., Bessa, M., 2019. Deep learning predicts path-dependent plasticity. Proc. Natl. Acad. Sci. Unit. States Am.
116, 26414–26420.
Page, M.A.M., Hartmann, S., 2018. Experimental characterization, material modeling, identification and finite element simulation of the thermo-mechanical behavior
of a zinc die-casting alloy. Int. J. Plast. 101, 74–105.
Papadopoulos, V., Soimiris, G., Giovanis, D., Papadrakakis, M., 2018. A neural network-based surrogate model for carbon nanotubes with geometric nonlinearities.
Comput. Methods Appl. Mech. Eng. 328, 411–430.
Patriarca, L., Beretta, S., Foletti, S., Riva, A., Parodi, S., 2020. A probabilistic framework to define the design stress and acceptable defects under combined-cycle
fatigue conditions. Eng. Fract. Mech. 224, 106784.
Pattanayak, S., 2017. Pro Deep Learning with Tensorflow: a Mathematical Approach to Advanced Artificial Intelligence in python. apress.
Pham, M.-S., Liu, C., Todd, I., Lertthanasarn, J., 2019. Damage-tolerant architected materials inspired by crystal microstructure. Nature 565, 305–311.
Qin, Q., Tian, M., Pu, J., Huang, J., 2018. Constitutive investigation based on the time-hardening model for ah36 material in continuous casting process. Adv. Mech.
Eng. 10, 1687814017750680.
Romano, S., Nezhadfar, P., Shamsaei, N., Seifi, M., Beretta, S., 2020. High Cycle Fatigue Behavior and Life Prediction for Additively Manufactured 17-4 Ph Stainless
Steel: Effect of Sub-surface Porosity and Surface Roughness. Theoretical and Applied Fracture Mechanics, p. 102477.
Rong, Q., Wei, H., Huang, X., Bao, H., 2019. Predicting the effective thermal conductivity of composites from cross sections images using deep learning methods.
Compos. Sci. Technol. 184, 107861.
Safari, A., Forouzan, M., Shamanian, M., 2013. Thermo-viscoplastic constitutive equation of austenitic stainless steel 310s. Comput. Mater. Sci. 68, 402–407.
Samaniego, E., Anitescu, C., Goswami, S., Nguyen-Thanh, V., Guo, H., Hamdia, K., Zhuang, X., Rabczuk, T., 2020. An energy approach to the solution of partial
differential equations in computational mechanics via machine learning: concepts, implementation and applications. Comput. Methods Appl. Mech. Eng. 362,
112790.
Senior, B., 1990. Constitutive relationships for the time-dependent deformation of precipitation-hardened metals. Mater. Sci. Eng., A 124, 159–169.
Settgast, C., Abendroth, M., Kuna, M., 2019. Constitutive modeling of plastic deformation behavior of open-cell foam structures using neural networks. Mech. Mater.
131, 1–10.
Settgast, C., Hütter, G., Kuna, M., Abendroth, M., 2020. A hybrid approach to simulate the homogenized irreversible elastic–plastic deformations and damage of foams
by neural networks. Int. J. Plast. 126, 102624.
Sgambitterra, E., Maletta, C., Furgiuele, F., 2015. Temperature dependent local phase transformation in shape memory alloys by nanoindentation. Scripta Mater. 101,
64–67.
Shrimali, B., Lefèvre, V., Lopez-Pamies, O., 2019. A simple explicit homogenization solution for the macroscopic elastic response of isotropic porous elastomers. J.
Mech. Phys. Solid. 122, 364–380.
Spear, A.D., Kalidindi, S.R., Meredig, B., Kontsos, A., Le Graverend, J.-B., 2018. Data-driven materials investigations: the next frontier in understanding and predicting
fatigue behavior. J. Occup. Med. 70, 1143–1146.
Teichert, G.H., Natarajan, A., Van der Ven, A., Garikipati, K., 2019. Machine learning materials physics: integrable deep neural networks enable scale bridging by
learning free energy functions. Comput. Methods Appl. Mech. Eng. 353, 201–216.
Ulu, E., Zhang, R., Kara, L.B., 2016. A data-driven investigation and estimation of optimal topologies under variable loading configurations. Comput. Methods
Biomech. Biomed. Eng.: Imaging & Visualization 4, 61–72.
Unger, J.F., Könke, C., 2009. Neural networks as material models within a multiscale approach. Comput. Struct. 87, 1177–1186.
Vázquez, M., Houzeaux, G., Koric, S., Artigues, A., Aguado-Sierra, J., Arís, R., Mira, D., Calmet, H., Cucchietti, F., Owen, H., et al., 2016. Alya: multiphysics
engineering simulation toward exascale. Journal of computational science 14, 15–27.
Wang, K., Glicksman, M., Groza, J., Shackelford, J., Lavernia, E., Powers, M., 2007. Processing Handbook.
White, D.A., Arrighi, W.J., Kudo, J., Watts, S.E., 2019. Multiscale topology optimization using neural network surrogate models. Comput. Methods Appl. Mech. Eng.
346, 1118–1135.
Willmott, C.J., Matsuura, K., 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance.
Clim. Res. 30, 79–82.
Xu, B., Wang, N., Chen, T., Li, M., 2015. Empirical Evaluation of Rectified Activations in Convolutional Network arXiv preprint arXiv:1505.00853.
Yang, C., Kim, Y., Ryu, S., Gu, G.X., 2020. Prediction of composite microstructure stress-strain curves using convolutional neural networks. Mater. Des. 189, 108509.
Yoon, G.H., Choi, H., Hur, S., 2018. Multiphysics topology optimization for piezoelectric acoustic focuser. Comput. Methods Appl. Mech. Eng. 332, 600–623.
Yuan, Z., Fish, J., 2008. Toward realization of computational homogenization in practice. Int. J. Numer. Methods Eng. 73, 361–380.
Zappulla, M.L., Cho, S.-M., Koric, S., Lee, H.-J., Kim, S.-H., Thomas, B.G., 2020. Multiphysics modeling of continuous casting of stainless steel. J. Mater. Process.
Technol. 278, 116469.
Zerilli, F.J., Armstrong, R.W., 1987. Dislocation-mechanics-based constitutive relations for material dynamics calculations. J. Appl. Phys. 61, 1816–1825.
Zhang, Y., Ngan, A.H., 2019. Extracting dislocation microstructures by deep learning. Int. J. Plast. 115, 18–28.
Zhu, H., 1996. Coupled Thermo-Mechanical Finite-Element Model with Application to Initial Solidification. Ph.D. thesis. The University of Illinois at Urbana-
Champaign.
30

Deep Learning Plasticity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Plasticity

Uploaded by

Copyright:

Available Formats

International Journal of Plasticity 136 (2021) 102852

Contents lists available at ScienceDirect

International Journal of Plasticity

Deep learning for plasticity and thermo-viscoplasticity

Keywords: Predicting history-dependent materials’ responses is crucial, as path-dependent behavior appears

2.1. Problem definition

2.2. Finite element analysis (FEA)

2.3. Datasets and loss function

2.4. Sequence learning models

3. Periodic elastoplastic cellular material (case 1)

3.1. Problem overview

σ o ̂ϵ˙ pl = σ : ϵ̇pl (7c)

Fig. 2. Performance of the different models developed for case 1.

3.2. Data description, generation, and processing

3.3. Results and discussion

4. Thermo-viscoplastcity in steel solidification (case 2)

4.1. Problem overview

Fig. 5. Upper part of a slab caster with the slice domain.

Fig. 6. Thermal and mechanical boundary conditions.

Fig. 7. Example of flux and displacement profiles.

Fig. 8. Performance of the different models developed for case 2.

4.2. Data description, generation, and processing

4.3. Results and discussion

CRediT authorship contribution statement

Declaration of competing interest

Appendix A. Introduction to neural networks and sequence learning

Appendix A.1. Feedforward dense artificial neural networks

Fig. A.1. Dense artificial neural network.

Fig. A.2. Generic training process in ANN.

Appendix A.2. Recurrent neural networks

̂ T+1 with mapping ̂

Fig. A.3. Recurrent neural network.

Appendix A.3. Temporal convolutional network

Fig. A.4. 2D convolutional neural network.

Appendix B. . Convergence plots and additional examples

Appendix B.1. Case 1

Appendix B.2. Case 2

You might also like