Professional Documents
Culture Documents
Physics-Informed Neural Networks For Multiphysics Data
Physics-Informed Neural Networks For Multiphysics Data
com/science/article/pii/S0309170819311649
Manuscript_62e99012c6a9184934342e2cbc685dcd
Abstract
∗ Correspondingauthor
Email address: Alexandre.Tartakovsky@pnnl.gov (Alexandre M. Tartakovsky)
© 2020 published by Elsevier. This manuscript is made available under the Elsevier user license
https://www.elsevier.com/open-access/userlicense/1.0/
the accuracy of parameter estimation increases as more different multiphysics
variables are inverted jointly.
Keywords: Physics-informed deep neural networks, data assimilation,
parameter estimation, inverse problems, subsurface flow and transport
1. Introduction
2
Recent advances in machine learning (ML) methods, automatic differentiation
(AD) [19], and ML libraries (e.g., TensorFlow [20] and Pytorch [21]) have made
them potentially powerful tools for parameter estimation and data assimilation.
For example, Schmidt and Lipson [22] applied symbolic regression to learn
30 conservation laws, and Brunton et al. [23] used sparse regression to discover
equations of nonlinear dynamics directly from data. Physics-informed (deep)
neural networks (PINNs) were used to learn solutions and parameters in partial
and ordinary differential equations [24, 25, 26, 24]. Recently, PINNs were
extended for inverse problems associated with partial differential equations
35 (PDEs) with space-dependent coefficients (e.g., to estimate hydraulic conductivity
using sparse measurements of conductivity and hydraulic head) [27].
In this study, we extend the PINN-based parameter estimation method of [27]
to assimilate multiphysics measurement and refer to this multiphysics-informed
neural network approach as MPINN. We consider a subsurface transport problem
40 with sparse measurements of hydraulic conductivity, hydraulic head, and solute
concentration. In this approach, we use the Darcy and advection–dispersion
equations together with data to train deep neural networks (DNNs) that represent
space-dependent conductivity, head, and concentration fields. During training
of the DNNs, the governing equations and the associated boundary conditions
45 are enforced at the “residual” points over the domain. We demonstrate that
for sparse data, the MPINN approach significantly improves the accuracy of
parameter and state estimation as compared to standard DNNs trained with data
only. The MPINN approach can be easily extended to assimilate other types of
variables and physics laws, e.g., geophysical measurements and the corresponding
50 equations that describe the relationships between electrical resistivity, current,
and potential.
This paper is organized as follows. In Section 2, we describe the MPINN
method and its formulation for transport problems. The performance of the
MPINN approach for data assimilation, including the dependence of estimation
55 errors on the number of measurements, is given in Section 3. The effects of
the neural network size and the conductivity field correlation length on the
3
parameter estimation errors are discussed in Section 4. Conclusions are given in
Section 5.
65 In the PINN approach and its MPINN extension, we employ fully connected
feed-forward networks to approximate unknown variables (states) and space-
dependent parameters, as described in Appendix A and shown in Figure A.17.
Given a sufficiently large number of hidden layers, DNNs have excellent
representative properties but require a lot of data to train them. This creates a
70 challenge in applying DNNs to subsurface problems where measurements are
usually sparse. For the purpose of this work, we define sparse measurements
as those that do not sufficiently cover the computational domain to accurately
estimate parameters with the standard data-driven DNNs method described
in Appendix A. In [27], we demonstrated that the Darcy law can be used as a
75 constraint for training a DNN model of conductivity that significantly improves
the predictive ability of the DNN model.
In the rest of this section, we extend the PINN parameter estimation method
of [27] to a data assimilation problem where different types of measurements are
used to estimate parameters and states. Consider a system of PDEs forming the
boundary value problem defined on the domain Ω ⊂ Rd with the boundary ∂Ω:
4
parameter vector (e.g., hydraulic and electric conductivities), L denotes the
80 known (nonlinear) differential operator, and the operator B expresses arbitrary
boundary conditions associated with the problem. The boundary conditions can
be of the Dirichlet and Neumann types applied on ∂D Ω and ∂N Ω, respectively,
such that ∂D Ω ∪ ∂N Ω = ∂Ω and ∂D Ω ∩ ∂N Ω = ∅.
We use the DNNs to approximate both state variables and unknown param-
eters, u(x) ≈ û(x; θ) and p(x) ≈ p̂(x; γ), x ∈ Ω, where θ and γ are weights
or parameters (which need to be estimated or trained) in the corresponding
DNNs. To determine these parameters, we minimize the loss function J(θ, γ)
with physics-informed penalty terms:
where
Here, Jd (θ, γ) is the loss due to a mismatch with the data (i.e., the measurements
of u and p):
1 X 1 X
Jd (θ, γ) = (û(x; θ) − u∗ (x))2 + (p̂(x; θ) − p∗ (x))2 , (4)
|Tu | |Tp |
x∈Tu x∈Tp
Jf (θ, γ) is the loss due to mismatch with the governing PDEs L(u(x); p(x)) = 0:
1 X
Jf (θ, γ) = (L(û(x; θ); p̂(x, γ)))2 , (5)
|Tf |
x∈Tf
and Jb (θ, γ) is the loss due to mismatch with the boundary conditions B(u(x); p(x)) =
0:
1 X
Jb (θ, γ) = (B(û(x; θ); p̂(x; γ)))2 . (6)
|Tb |
x∈Tb
In (3), ωf and ωb are weights that determine how strongly mismatch with
85 the governing PDEs and boundary conditions is penalized relative to data
mismatch. In this work, we assume that the measurements and physics model
are exact and set ωf = ωb = 1. The sets Tu = {x1 , x2 , ..., x|Tu | } ⊂ Ω and
5
Tp = {x1 , x2 , ..., x|Tp | } ⊂ Ω denote the measurement locations of u and p,
respectively, and u∗ (x), x ∈ Tu and p∗ (x), x ∈ Tp are the measured values of
90 u and p at these locations. The sets Tf = {x1 , x2 , ..., x|Tf | } ⊂ Ω and Tb =
{x1 , x2 , ..., x|Tb | } ⊂ ∂Ω denote locations of the “residual” points where Jf (θ, γ)
and Jb (θ, γ) are, respectively, minimized. The penalty terms Jf (θ, γ) and Jb (θ, γ)
force the DNN approximations of u and p to satisfy the governing equation (1)
at the residual points. Note that while it is preferable to enforce physics over
95 the whole domain, the computational cost of estimating and minimizing the
loss function (3) increases with the number of residual points. In this work, we
demonstrate convergence of the solution of (2) with an increasing number of
residual points, meaning that the DNNs û(x; θ) and p̂(x; γ) can be accurately
trained using a finite number of residual points. Similar convergence results for
100 solving PDEs with the PINN method were also observed in [24, 28, 27, 25, 29].
The loss Jf (θ, γ) is evaluated by computing spatial derivatives of û(x; θ) and
p̂(x; γ) using AD. AD is also used to evaluate the normal derivative n · ∇ in
the Neumann boundary condition in the loss Jb (θ, γ) (see details in Section 2.2).
AD is implemented in most ML libraries, including TensorFlow and Pytorch
105 [21], where it is mainly used to compute derivatives with respect to the DNN
weights (i.e., θ and γ). In the PINN method, AD allows the implementation of
any PDE and boundary condition constraints without numerically discretizing
and solving the PDEs.
Another benefit of enforcing PDE constraints via the penalty term Jf (θ, γ) is
110 that it allows using the corresponding weight ωf to account for the fidelity of the
PDE model. For example, we can assign a smaller weight to a low-fidelity PDE
model. In general, the number of unknown parameters in θ and γ is much larger
than the number of measurements, and training the DNNs requires regularization.
One can consider the losses Jb (θ, γ) and Jf (θ, γ) in the minimization problem
115 (2) as physics-informed regularization terms [27, 30].
6
2.2. Application of MPINN for subsurface transport problems
For sparsely sampled systems, data assimilation can significantly improve the
accuracy of parameter and state estimation. Here, we assume that the sparse
steady-state measurements of a synthetic tracer test in a heterogeneous porous
domain Ω = [0, L1 ] × [0, L2 ] are available, where the solute is continually injected
at the x1 = 0 boundary. This data includes the measurements of conductivity
Ki∗ := K(xK ∗ h ∗ C
i ), hydraulic head hi := h(xi ), and concentration Ci := C(xi ) at
NK h Nh C NC
the locations {xK
i }i=1 , {xi }i=1 , and {xi }i=1 , respectively, where NK , Nh , and
where φ is the effective porosity of the medium, v is the average pore velocity,
and D is the dispersion coefficient:
D = Dw τ I + α||v||2 . (9)
7
αL and αT . The conductivity K(x) is assumed to be unknown except at the
NK
120 measurement locations {xK
i }i=1 .
K̂(x) := NK (x; θK )
Ĉ(x) := NC (x; θC )
where θK , θh , and θC are the vectors of parameters associated with each neural
125 network. For the considered two-dimensional problem, the dimension of the
input layers in these DNNs is two. The K, h, and C fields are scalar; therefore,
the dimensionality of the output layers in these DNNs is one.
The specific form of the general loss function (3) for training these DNNs is
given by equations (B.4)–(B.6) in Appendix B. In Eq. (B.6), PDEs (7) and (8)
130 are enforced at the residual points given by the sets Tfh and TfC , respectively,
where |Tfh | = Nfh and |TfC | = NfC . A schematic diagram of the MPINN method
for data assimilation in the transport problem described by Equations (7) and
(8) is shown in Figure 1.
In this work, we compare three approaches for training K̂: the MPINN
135 approach where we jointly train the DNNs K̂, ĥ, and Ĉ by minimizing the loss
function (B.4); the PINN–Darcy approach where we jointly train K̂ and ĥ by
only enforcing the Darcy equation and boundary conditions (7); the data-driven
DNN approach where we separately train K̂, ĥ, and Ĉ using only data. The
performance of these three approaches is investigated and compared in Sections
140 3 and 4.
Given that the loss function is highly nonlinear and non-convex with respect to
the network parameters θK , θh , and θC , we use the gradient descent minimization
algorithms, including the Adam [31], and L-BFGS-B [32] methods. In the L-
BFGS-B optimizer, the iterative minimization process is terminated once the
8
DNNs: 𝜃& , 𝜃( , 𝜃) Outputs AD layers PDE & boundary residuals
𝐾 ∇𝐾
∇ℎ 𝑓 ( 𝐾, ℎ , 𝑓/( 𝐾, ℎ
⋮ Darcy Eqn.
𝒙 ℎ ∇ℎ Advection-Dispersion
Eqn.
𝒗
∇𝐶 𝑓 ) 𝐾, ℎ, 𝐶 ,𝑓/) 𝐶
|𝒗|
𝐶 ⋮
𝐽(𝜃& , 𝜃( , 𝜃) )
Update (𝜃& , 𝜃( , 𝜃) ) by minimizing 𝐽
Physics-informed loss
Figure 1: A schematic diagram of the MPINN method for multiphysics data assimilation in
subsurface transport problems. Three DNNs are used to represent the unknown K(x), h(x),
and C(x) fields. Spatial derivatives of these fields in the PDE and boundary condition residuals
are computed with AD. The multiphysics loss function J and PDE residuals f h , fN
h , f C , and
145 relative change in the loss function becomes smaller than a prescribed value. In
the Adam method, the DNNs training stops once the total loss function becomes
smaller than a prescribed small value or the predefined number of iterations
(epochs) is completed. As suggested in [24, 27, 33, 34], L-BFGS-B, a quasi-
Newtown method, shows superior performance with a better rate of convergence,
150 lower gradient vanishing, and a lower computational cost for problems with a
relatively small amount of training data and/or residual points. In this study,
we employ the L-BFGS-B method for the data-driven DNN and PINN–Darcy
methods with the default settings from Scipy [35]. However, our numerical
experiments show that the L-BFGS-B algorithm has a slow convergence for the
155 MPINN method, where a relatively large number of residual points is used.
We propose a two-step training algorithm, where the loss function is first
9
minimized by the Adam algorithm with a prescribed stop criterion followed
by the L-BFGS-B optimizer. In this work, we use the two-step algorithm in
all MPINN simulations unless it is stated otherwise. At the beginning of the
160 training process, the parameters of the neural networks are randomly initialized
using the Xavier scheme [36].
where γ(x) and γ̂(x, θγ ) denote the reference fields and the DNN approximations,
respectively.
180 We first investigate the effect of the DNN size nh × mh on the approximation
errors, where nh is the number of hidden layers and mh is the number of neurons
10
in each hidden layer. Note that all DNNs have a two-dimensional input layer
(corresponding to x1 and x2 ) and a one-dimensional output layer (corresponding
to scalar quantities K, h, or C).
(a) K (b) h
(c) C
Figure 2: Reference fields: (a) conductivity K, (b) hydraulic head h, and (c) concentration C.
We test the accuracy of the data-driven DNN approach (i.e., regression) for
estimating K(x) to establish a baseline for comparison with the MPINN and
PINN–Darcy methods. The K̂ DNN sizes and the corresponding mean and
variance of the L2 errors are summarized in Table 1. The statistics of the L2
190 errors are computed from five simulations in which the DNNs are randomly
initialized using the Xavier algorithm. The size of the networks is varied by
changing the number of hidden layers nh , while the number of neurons per layer
is set to mh = 32. The regression errors decrease from more than 100% for
11
Table 1: The effect of L1 and L2 regularization on the accuracy of the data-driven DNN K(x)
estimation. The mean and standard deviation of K as functions of the DNN size (the number
of hidden layers nh ) and the number of K measurements. The DNNs K̂(x, θK ) are trained
using the data-driven DNN method both with and without L1 or L2 regularization. The
corresponding standard deviations are given in parentheses.
Number of K measurements, NK
DNN size 16 32 48 64 80 96
3 × 32 156.1%(0.482) 53.5%(0.391) 44.6%(0.287) 17.5%(0.135) 6.0%(0.022) 3.2%(0.010)
DNN 4 × 32 206.9%(2.068) 64.9%(0.420) 42.8%(0.282) 11.4%(0.121) 4.5%(0.019) 3.3%(0.005)
5 × 32 128.0%(1.304) 53.1%(0.262) 49.7%(0.324) 21.4%(0.212) 4.3%(0.018) 3.0%(0.005)
3 × 32 30.0%(0.051) 21.1%(0.030) 9.9%(0.023) 4.3%(0.008) 3.1%(0.011) 2.0%(0.002)
DNN+L1 4 × 32 28.6%(0.049) 23.0%(0.028) 11.7%(0.014) 5.3%(0.016) 2.6%(0.005) 1.7%(0.001)
5 × 32 28.1%(0.046) 22.3%(0.057) 10.7%(0.029) 3.2%(0.009) 2.1%(0.003) 1.9%(0.002)
3 × 32 26.4%(0.044) 20.0%(0.016) 7.7%(0.010) 3.38%(0.009) 2.8%(0.004) 2.0%(0.003)
DNN+L2 4 × 32 34.5%(0.124) 17.8%(0.031) 10.1%(0.009) 3.1%(0.004) 2.6%(0.007) 2.1%(0.003)
5 × 32 28.8%(0.050) 19.5%(0.027) 9.6%(0.019) 2.9%(0.006) 2.5%(0.004) 1.9%(0.002)
12
direct observations in the K̂ DNN training.
13
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
(a) (b)
0.06 0.06
0.05 0.05
0.04 0.04
0.03 0.03
0.02 0.02
0.01 0.01
0 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
(c) (d)
Figure 3: Mean of the K error in the data-driven DNN and PINN–Darcy estimations of
K(x) (upper) and hydraulic head h(x) (bottom) as functions of N = NK = Nh and the
number of residual points Nfh . The right and left columns present results with and without L2
regularization, respectevily. The bars correspond to one standard deviation of K and quantify
uncertainty due to random initialization of DNNs. The K̂ and ĥ DNNs sizes are 5 × 32 and
3 × 32, respectively.
14
The L2 regularization significantly reduces the mean and standard deviation
250 of K and h . However, for N < 50, the PINN–Darcy method with Nfh =
400 provides more accurate results for both the K and h fields. Adding L2
regularization to the PINN method further reduces K and h , especially for a
relatively small number of residual points Nfh . Because the computational cost
of PINN–Darcy and MPINN increases with increasing Nfh , a combination of L2
255 regularization and physics constraints can potentially reduce the computational
cost of the PINN–Darcy and MPINN methods. Finally, we analyze the loss
functions decay during the K̂ DNN training in the data-driven DNN, data-
driven DNN with L2 regularization, and PINN–Darcy methods with NK = 36,
Nfh = 200, and the L-BFGS-B optimizer. These loss functions are shown in
260 Figure 4. The data-driven DNN method exhibits overfitting, as evident from
the small training error and large test error (see Figure 3 (c)), whereas both
PINN–Darcy and L2 regularizations prevent overfitting. We also see that the
L-BFGS-B optimizer is robust for these three approaches.
Figure 4: Loss functions in (a) data-driven DNN, (b) data-driven DNN with L2 regularization,
and (c) PINN–Darcy methods for estimating K(x) with NK = 36. In figure (c), Nh = 36,
Nfh = 200, J is the total loss, and JK and Jh are the parts of the loss function with respect to
K and h measurements, respectively. The DNN sizes are 5 × 32 for K̂ and 3 × 32 for ĥ.
265 Here, we investigate the MPINN method for jointly training the K̂(x; θK ),
ĥ(x; θh ), and Ĉ(x; θC ) DNNs. Figure 5 shows the mean errors of the MPINN-
estimated fields as functions of NK , Nh , NC , and NfC . The number of points
15
0.3
0.25
0.2
0.15
0.1
0.05
0
10 20 30 40 50 60 70 80 90 100
(a)
0.025
0.1
0.02
0.08
0.015 0.06
0.01 0.04
0.005 0.02
0 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
(b) (c)
Figure 5: The relative L2 errors K , h , and C in the MPINN estimation of (a) conductivity
K(x), (b) hydraulic head h(x), and (c) concentration C(x), respectively, versus the number
of measurements N = NK = Nh , NC , and the number of residual points NfC . Errors in the
PINN–Darcy K and h estimations are also provided in (a) and (b), respectively. In all cases,
Nfh = 200, and the K̂, ĥ, and Ĉ DNNs size is 5 × 32.
where the residuals of the Darcy equation are minimized is set to Nfh = 200 in
all cases. For comparison, we also show the PINN–Darcy mean K and h errors.
270 For a small number of K and h measurements (N < 50), the MPINN method
reduces K by approximately 25% and h by ≈ 80% relative to the PINN–Darcy
method.
The MPINN method leads to an even bigger improvement in the C field
estimation relative to the data-driven DNN method. For example, for NC = 64,
275 C is 0.02 in MPINN with NfC = 1000, while C = 0.22 in the data-driven DNN
method. In addition, the C field estimation improves as NK and Nh increase.
16
100
10-2
10-4
10-6
0 10 20 30 40 50
103
Figure 6: Loss J as a function of the number of epochs in the MPINN joint training of
the K̂, ĥ, and Ĉ DNNs. Also shown are JfC (the part of the loss due to residual in the
advection–dispersion equation B.1(b)) and JK , Jh , and JC , which are parts of the loss function
with respect to K, h, and C measurements, respectively. The K̂, ĥ, and Ĉ DNNs size is 5 × 32
and NK = Nh = 36, NC = 64, Nfh = 200, and NfC = 1000.
17
algorithm. The small final value of JfC indicates that the Ĉ, K̂, and ĥ DNNs
approximately satisfy the advection–dispersion equation (8).
Figure 7: Absolute point errors computed as the difference between the reference K(x) and
K̂(x, θK ) estimated with (a) data-driven DNN, (b) data-driven DNN with L2 regularization, (c)
PINN–Darcy, and (d) MPINN. In these simulations, NK = 36, Nh = 36, NC = 64, Nfh = 200,
and NfC = 1000. The locations of K measurements are denoted by black circles. Relative L2
errors K are also provided for all DNN methods.
295 The distributions of absolute point errors in the K(x), h(x), and C(x) fields
learned with the data-driven DNN, data-driven DNN with L2 regularization,
PINN–Darcy, and MPINN methods are given in Figures 7–9. In this comparison
study, we use NK = Nh = 36, NC = 64, Nfh = 200, and NfC = 1000. As
expected from a regression method, the data-driven DNN method errors increase
300 as distance from the measurement locations increases. L2 regularization helps
reduce these errors. The PINN–Darcy and MPINN methods further reduce point
18
(a) DNN: h = 2.68% (b) PINN-Darcy: h = 1.47% (c) MPINN: h = 0.77%
Figure 8: Relative L2 error h and absolute errors (differences) between the reference h(x)
and ĥ(x, θh ) trained with (a) data-driven DNN, (b) PINN–Darcy, and (c) MPINN. In these
examples, Nh = NK = 36, NC = 64, Nfh = 200, and NfC = 1000. Locations of h measurements
are denoted by back circles.
Figure 9: Relative L2 errors C and absolute point errors (differences) between the reference
C(x) and Ĉ(x, θC ) estimated with (a) data-driven DNN and (b) MPINN. In these examples,
NC = 64, NK = Nh = 36, Nfh = 200, and NfC = 1000. Locations of C measurements are
denoted by black circles.
errors, especially in parts of the domain with no measurements. For example, the
data-driven DNN method yields a poor approximation of C near the injection
point, with absolute point errors on the order of 0.1. In the MPINN method
305 with the same number of C measurements, the point errors in the same region
are on the order of 0.01.
19
4. DNN methods for estimating conductivity with complex correla-
tion structure
Here, we investigate the performance of the DNN methods for estimating the
310 spatially correlated conductivity field K(x) = exp(Y (x)) with the exponential
covariance function CY (x, x0 ) = σ 2 exp(−||x − x0 ||/2λ2 ), where σ 2 and λ are
the variance and correlation length of Y (x), respectively. Specifically, we study
the performance of the DNN methods as a function of λ.
315 In Section 3, we showed that the network size affects the accuracy of the DNN
predictions, especially when the data is sparse. In this section, we study the
dependence of the optimal DNN size on the correlation length of the approximated
field.
We consider three K(x) fields generated as realizations of lognormal processes
320 with λ = 0.2, 0.5, and 1.0 (see Figure 10).
Table 2: The number of total tunable parameters corresponding to the DNN structure 3 × mh
as a function of mh .
mh 10 20 30 40 50 60 70 80 90 100
DOF 261 921 1981 3441 5301 7561 10221 13281 16741 20601
Here, we vary the DNN size by changing the number of neurons in each
hidden layer mh . The number of tunable parameters as a function of mh for the
chosen DNN architecture is given in Table 2. The conductivity fields in Figure
10 are generated on the domain Ω = [0, 1] × [0, 0.5] on a 256 × 128 grid with
325 32,768 grid points. Here, we use the values of K at 20,000 grid points to train
K̂(x, θK ) (without any physics constraints) and use the remaining K values to
evaluate the accuracy of K̂(x, θK ).
For this large number of measurements, we find that the L-BFGS-B algorithm
is not efficient for minimizing the loss function, especially for the field with λ = 0.2.
20
(a) (b)
(c)
Figure 10: Reference conductivity fields with different correlation lengths: (a) λ = 0.2, (b)
λ = 0.5, and (c) λ = 1.0.
330 Therefore, we adopt the Adam method with an experimentally determined initial
learning rate of 0.0002 and a batch size of 1000. We find that 4×105 iterations are
needed to train K̂(x, θK ) for λ = 0.2, 3 × 105 iterations for λ = 0.5, and 2 × 105
iterations for λ = 1.0 to achieve a sufficiently low training error. Our results
show that less iterations are needed to train DNNs for smoother conductivity
335 fields (with larger correlation lengths).
Figure 11 shows the mean and standard deviation of K as functions of mh
for the three correlation lengths. The statistical moments of K are computed
from simulations with 10 different DNN initializations. Initially, the approxima-
tion error decreases as the DNN size increases because of the improved DNN
340 representation ability. We can see that a smaller DNN is sufficient to represent
a smoother field with larger correlation length. We also see that for fields with
21
0.05 0.02
0.04
0.015
0.03
0.01
0.02
0.005
0.01
0 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80
0.01
0.008
0.006
0.004
0.002
0
10 20 30 40 50 60 70 80
Figure 11: The relative error of DNN approximation as a function of mh (number of neurons
in each hidden layers) for three conductivity fields with the correlation lengths: (a) λ = 0.2,
(b) λ = 0.5, and (c) λ = 1.0.
the correlation lengths 0.5 and 1, the approximation error increases because of
overfitting once the DNN size exceeds the optimal size. For example, for the
K field with λ = 0.5 (see Figure 11 (b)), the smallest relative error of 0.52% is
345 reached at mh = 60. DNNs with mh < 60 are not representative enough, and
DNNs with mh > 60 cause overfitting. Therefore, we postulate that for a DNN
with three hidden layers, mh = 60 is optimal for this K field. For the K fields
with λ = 0.2 and λ = 1.0, the optimal DNN size is reached at mh = 90 and
mh = 40, respectively. For the K field with λ = 1.0, the minimum of the mean
350 error function (≈ 0.3%) is very shallow, as shown in Figure 11 (c). Therefore,
we select mh = 40 as the optimal DNN width because it results in the K with
22
the smallest standard deviation.
Figure 12: The optimal neural network size as a function of the correlation length λ.
Figure 12 shows the optimal DNN size as a function of λ. For the considered
range of λ, the DNN size decreases as a power of λ. It is important to note that
355 in addition to the correlation length of the modeled field, the optimal DNN size
depends on many other factors, including the type of activation function and
the number of hidden layers. In this study, we fix the number of hidden layers
and the activation function. Therefore, the results in Figure 12 might not apply
to other DNN architectures.
23
2 1
1.8 0.9
1.6 0.8
1.4 0.7
1.2 0.6
1 0.5
0.8 0.4
0.6 0.3
0.4 0.2
0.2 0.1
0 0
20 30 40 50 60 70 80 20 30 40 50 60 70 80
0.3
0.25
0.2
0.15
0.1
0.05
0
20 30 40 50 60 70 80
Figure 13: The relative error K as a function of NK in the data-driven DNN, PINN–Darcy,
and MPINN methods in problems with (a) λ = 0.2, (b) λ = 0.5, and (c) λ = 1.0. In these
examples, Nh = 40, NC = 100, Nfh = 1000, NfC = 1000, and the DNNs size is 3 × 60.
370 Figure 13 compares the approximation errors in the data-driven DNN, PINN–
Darcy, and MPINN methods as functions of NK for the three conductivity fields.
In these simulations, we use Nh = 40, NC = 100, Nfh = 1000, and NfC = 1000.
For all three correlation lengths, we see that adding physics constraints
improves the accuracy of the DNN approximation of the K field. The biggest
375 reduction in the estimation error is achieved by adding h measurements and the
Darcy equation constraint, as shown by the comparison of the data-driven DNN
and PINN–Darcy estimation errors. Adding C measurements and advection–
dispersion equation constraints further reduces the approximation error. The
advantage of MPINN is especially pronounced for sparse data (small NK ) and
24
Table 3: Relative errors h and C for problems with λ = 0.2, 0.5, and 1.0. In these simulations,
Nh = 40, NC = 100, Nfh = 1000, NfC = 1000, and the DNNs size is 3 × 60.
h C
NK 20 40 60 80 20 40 60 80
DNN 4.72% 18.25%
λ = 0.2 PINN 6.82% 6.49% 4.74% 4.35%
MPINN 6.71% 6.39% 5.19% 3.74% 8.60% 7.35% 5.91% 7.10%
DNN 1.75% 18.65%
λ = 0.5 PINN 0.94% 0.92% 0.75% 0.57%
MPINN 1.04% 0.69% 0.75% 0.48% 2.02% 1.14% 1.41% 1.36%
DNN 1.28% 16.72%
λ = 1.0 PINN 6.43% 2.58% 0.72% 0.60%
MPINN 2.53% 0.95% 0.74% 0.63% 3.51% 1.20% 1.13% 1.35%
380 small correlation lengths. For example, for λ = 0.2 and NK = 20, the K errors
are 1.8, 0.66, and 0.57 in the data-driven DNN, PINN–Darcy, and MPINN
methods, respectively.
Table 3 lists h and C as functions of NK for the data-driven DNN, PINN–
Darcy, and MPINN methods and the K fields with λ = 0.2, 0.5, and 1. We
385 can see here that the PINN–Darcy and MPINN methods provide significantly
improved hydraulic head and concentration estimations compared to the data-
driven DNN method. We note that Nh and NC are fixed in this comparison
study, and the data-driven DNN hydraulic head and concentration estimates
do not depend on NK . Moreover, the PINN–Darcy and MPINN estimations of
390 h and C improve with increasing NK . This demonstrates the capability of the
physics-informed DNNs to learn from indirect measurements. The improvements
are particularly pronounced for estimating (highly nonlinear) C(x), e.g., for
λ = 1 and N C = 100, C decreases from 16.72% in the data-driven DNN to
1.35% in MPINN.
395 Figures 14–16 show the K̂(x), ĥ(x), and Ĉ(x) DNNs estimated with the
25
data-driven DNN, PINN–Darcy, and MPINN methods, where λ = 0.5, NK = 40,
Nh = 40, NC = 100, Nfh = 1000, and NfC = 1000. In Figure 14, the comparison
of the estimated and reference K fields shows that PINN–Darcy significantly
improves the data-driven DNN prediction. MPINN further improves the K
400 estimation, as indicated by the smaller K . The data-driven DNN approximation
near the upper left corner significantly differs from the ground truth K field due
to the lack of measurements in this region. However, the approximation error
around this area is greatly reduced in the PINN–Darcy and MPINN methods,
which leverages indirect observations (i.e., head and concentration observations)
405 located in this area, as shown in Figures 14 (c) and (d).
Figure 14: (a) The reference K field (λ = 0.5) and the relative L2 error K and absolute point
errors in K̂(x, θK ) trained with the (b) data-driven DNN, (c) PINN–Darcy, and (d) MPINN
methods. Locations of K measurements are denoted by black circles.
26
(a) Reference (b) h = 1.75%
Figure 15: (a) The reference h field and the relative L2 errors h and absolute errors in ĥ(x, θh )
trained with the (b) data-driven DNN, (c) PINN–Darcy, and (d) MPINN methods. Locations
of h measurements are denoted by black circles.
PINN–Darcy and MPINN approaches. For the highly nonlinear C field, the
data-driven DNN estimate is significantly less accurate than that found using
410 MPINN in terms of both the point and L2 errors, as shown in Figure 16. Notably,
MPINN is able to accurately describe the eye of the concentration plume with
very few direct measurements near this region. Once again, this demonstrates
that MPINN can use sparse direct and indirect measurements in combination
with PDEs to capture local features that otherwise cannot be described with
415 only direct measurements.
Finally, we investigate whether using the optimal-size K̂ DNN, as determined
in Section 4.1, would reduce the error in the estimated K, h, and C fields. As
an example, we choose the case with λ = 0.2. According to Figure 11 (a),
the optimal K̂ size for a field with λ = 0.2 is mh = 90. Table 4 presents the
27
(a) Reference (b) C = 18.65%
(c) C = 1.14%
Figure 16: (a) The reference C field, the relative L2 errors C , and absolute errors in Ĉ(x, θC )
trained with the (b) data-driven DNN and (c) MPINN methods. Locations of C measurements
are denoted by black circles.
420 estimation errors for the K̂ DNNs with mh = 60, 90, and 120. In this comparison
study, we fix the ĥ and Ĉ DNNs’ size at mh = 60 and use NK = 80, Nh = 40,
NC = 100 measurements, and Nfh = 1000 and NfC = 1000 residual points. We
can see that the optimal-size K̂ produces the smallest estimation errors not only
for the K field but also for the h field in the data-driven DNN, PINN–Darcy, and
425 MPINN methods. For the C field, the smallest error is achieved with mh = 60
in the K̂ DNN. This indicates that a smaller estimation error in K and h does
not always translate to a smaller error in C.
28
Table 4: The relative errors K , h , and C in the data-driven DNN, PINN–Darcy, and MPINN
methods as functions of mh in the K̂ DNN. The DNN architecture of the K̂, ĥ, and Ĉ DNNs
is 3 × mh ; and mh = 60 in the ĥ and Ĉ DNNs. In these examples, NK = 80, Nh = 40,
NC = 100, Nfh = 1000, NfC = 1000, and λ = 0.2.
mh = 60 mh = 90 mh = 120
K h C K h C K h C
DNN 64.8% 54.2% 60.5%
PINN 49.2% 4.35% 48.5% 3.95% 51.7% 4.40%
MPINN 41.9% 3.74% 7.10% 40.2% 3.64% 11.3% 53.1% 4.02% 8.95%
5. Conclusion
In this study, we presented the MPINN approach for data assimilation with
430 a focus on parameter and state estimation in subsurface transport problems. In
this approach, all unknown space-dependent parameters and states are modeled
with DNNs that are jointly trained by minimizing the loss function containing
the multiphysics data (e.g., conductivity, hydraulic head, and concentration
measurements) and the associated physics constraints, including the Darcy
435 and advection–dispersion equations. As a result, the DNNs can be trained
using indirect measurements and underlying physics in an unsupervised learning
fashion, which is important when the data is sparse.
We compared three DNN methods: (1) the pure data-driven DNN approach,
which only uses data to train DNNs; (2) the PINN approach, called "PINN–
440 Darcy," which utilizes the conductivity and hydraulic head measurements and the
Darcy equation; and( 3) the MPINN approach, which combines the conductivity,
head, and concentration measurements with the Darcy and advection–dispersion
equations.
Our numerical results show that both physics-informed methods (PINN–
445 Darcy and MPINN) are significantly more accurate for parameter estimation
than the data-driven DNN method; the physics-informed methods provide
regularization and reduce the uncertainty in DNN predictions, especially when the
29
direct measurements are limited. Furthermore, MPINN yields better parameter
and state estimation than PINN–Darcy.
450 We investigated the effect of the neural network size on the accuracy of
parameter and state estimation as a function of the correlation length of the
modeled K field. We demonstrated that in pure data-driven regression, small
and large networks might result in poor representability or overfitting, and that
an optimal DNN size increases with decreasing correlation length. The physics
455 constraints and added measurements reduce dependence of the DNN prediction
on the DNN size given that the DNN is large (representative) enough. However,
for a small number of measurements, we demonstrated that an optimal-size DNN
outperforms both the larger and smaller DNNs.
In subsurface applications, data is usually sparse and is often indirect. There-
460 fore, the MPINN approach offers a flexible and unified framework to deal with
sparse and multiphysics data. Because the proposed method involves training
DNNs by minimizing the loss function, the performance of training algorithms is
crucial. In our study, we found that introducing nonlinear PDE constraints into
the loss function increases the computational cost of training. Application of the
465 physics-informed DNNs to large-scale problems will require access to multi-GPU
computers and scalable training algorithms. The selection of training algorithms
and hyperparameters (learning rate, architecture of DNNs, etc.) should also be
studied in more details.
Acknowledgements
470 This research was partially supported by the U.S. Department of Energy
(DOE) Advanced Scientific Computing (ASCR) program. PNNL is operated by
Battelle for the DOE under Contract DE-AC05-76RL01830.
30
𝑥 𝑢(𝑥)
Hidden layers
y 2 (x) = σ(W 1 x + b1 )
y 3 (y 2 ) = σ(W 2 y 2 + b2 )
... (A.2)
y nl (y nl −1 ) = σ(W nl −1 y nl −1 + bnl −1 )
y nl +1 (y nl ) = W nl y nl + bnl .
The first layer is called the input layer, and the last layer is the output layer,
while all the intermediate layers are known as hidden layers. Here, nl denotes
the number of hidden layers, σ is the predefined activation function, x ∈ Rd
denotes the input (d is the number of spatial dimensions), y nl +1 is the output
vector, and θ denotes all weight and bias parameters in the DNN approximation
of u:
θ = {W 1 , W 2 , ..., W nl , b1 , b2 , ..., bnl }. (A.3)
31
In the "data-driven" approach, θ is directly estimated from the measurements of
u by minimizing the loss function L(θ) = x∈Tu (û(x; θ) − u∗ (x))2 :
P
X
θ = arg min (û(x; θ) − u∗ (x))2 , (A.4)
θ
x∈Tu
Next, the residuals of Equations (7) and (8) are expressed in terms of θK ,
θh , and θC as:
To enforce the Neumann boundary conditions for Equations (7) and (8), we
define DNNs that approximate fluxes at the boundaries:
h
fN 1 (x; θK , θh ) = −K̂(x)∂ ĥ(x)/∂x1 − q,
(B.2)
h
fN 2 (x; θK , θh ) = −K̂(x)∂ ĥ(x)/∂x2 ,
and
C
fN 1 (x; θC ) = ∂ Ĉ(x)/∂x1 ,
(B.3)
C
fN 2 (x; θC ) = ∂ Ĉ(x)/∂x2 .
32
The loss function is then defined as:
1
PNK ∗ 2
Jd (θK , θh , θC ) = NK i [K̂(xK
i ; θK ) − Ki ] (B.5)
PN
+ N1h i h [ĥ(xhi ; θh ) − h∗i ]2
PN ∗ 2
+ N1C i C [Ĉ(xCi ; θC ) − Ci ] ,
485 and the losses due to (partial differential equatian) PDE constraints and boundary
conditions are:
Jfh (θK , θh ) 1 h
(x; θK , θh )]2 ,
P
= |Tfh | x∈Tfh [f (B.6)
JfC (θK , θh , θC ) 1 C
(x; θK , θh , θC )]2 ,
P
= |TfC | x∈TfC [f
h 1
P h 2
JN 1 (θK , θh ) = h |
|TN h [fN 1 (x; θK , θh )] ,
x∈TN
1 1
h 1
P h 2
JN 2 (θK , θh ) = h |
|TN h [fN 2 (x; θK , θh )] ,
x∈TN
2 2
C 1
P C 2
JN 1 (θC ) = C |
|TN C [fN 1 (x; θC )] ,
x∈TN
1 1
C 1
P C 2
JN 2 (θC ) = C |
|TN C [fN 2 (x; θC )] ,
x∈TN
2 2
Jbh (θh ) 1
− h∗ (x)]2 ,
P
= |Tbh | x∈Tbh [ĥ(x; θh )
JbC (θC ) 1
− C ∗ (x)]2 .
P
= |TbC | x∈TbC [Ĉ(x; θC )
In Equation (B.6), PDEs (7) and (8) are enforced at the residual points given
by the sets Tfh and TfC , respectively, where |Tfh | = Nfh and |TfC | = NfC . The
terms with the subscripts N1 or N2 enforce the Neumann boundary conditions,
490 and those with the subscript b enforce the Dirichlet boundary conditions.
33
[3] E. Barbier, Geothermal energy technology and current status: An overview
495 (2002). doi:10.1016/S1364-0321(02)00002-3.
[4] J. C. Helton, Uncertainty and sensitivity analysis techniques for use in per-
formance assessment for radioactive waste disposal, Reliability Engineering
and System Safetydoi:10.1016/0951-8320(93)90097-I.
34
520 [11] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic
model using Monte Carlo methods to forecast error statistics, Journal of
Geophysical Research.
[13] G. Evensen, The Ensemble Kalman Filter: Theoretical formulation and prac-
tical implementation, Ocean Dynamicsdoi:10.1007/s10236-003-0036-9.
35
[20] B. Ramsundar, R. B. Zadeh, TensorFlow for deep learning: from linear
regression to reinforcement learning, " O’Reilly Media, Inc.", 2018.
[26] E. Weinan, B. Yu, The Deep Ritz Method: A Deep Learning-Based Nu-
merical Algorithm for Solving Variational Problems, Communications in
Mathematics and Statistics 6 (1) (2018) 1–14. arXiv:arXiv:1710.00211v1,
doi:10.1007/s40304-018-0127-z.
36
[28] L. Lu, X. Meng, Z. Mao, G. E. Karniadakis, DeepXDE: A deep learning
575 library for solving differential equations (2019) 1–17arXiv:1907.04502.
URL http://arxiv.org/abs/1907.04502
585 [31] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization (2014)
1–15arXiv:1412.6980.
URL http://arxiv.org/abs/1412.6980
37
[36] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedfor-
ward neural networks, in: Journal of Machine Learning Research, 2010.
38