Parallel Ensemble Kalman Method With Total Variation Regularization For Large-Scale Field Inversion

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Journal of Computational Physics 509 (2024) 113059

Contents lists available at ScienceDirect

Journal of Computational Physics


journal homepage: www.elsevier.com/locate/jcp

Parallel ensemble Kalman method with total variation


regularization for large-scale field inversion
Xin-Lei Zhang a,b , Lei Zhang a,b , Guowei He a,b,∗
a The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China
b School of Engineering Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

A R T I C L E I N F O A B S T R A C T

Keywords: Field inversion is often encountered in data-driven computational modeling to infer latent spatial–
Ensemble Kalman method varying parameters from available observations. The ensemble Kalman method is emerging as
Parallel implementation a useful tool for solving field inversion problems due to its derivative-free merits. However,
Field inversion
the method is computationally prohibitive for large-scale field inversion with high-dimensional
Data assimilation
Machine learning
observation data, which necessitates developing a practical efficient implementation strategy.
In this work, we propose a parallel implementation of the ensemble Kalman method with total
variation regularization for large-scale field inversion problems. It is achieved by partitioning the
computational domain into non-overlapping subdomains and performing local ensemble Kalman
updates at each subdomain parallelly. In doing so, the computational complexity of the ensemble-
based inversion method is significantly reduced to the level of local subdomains. Further, the total
variation regularization is employed to smoothen the physical field over the entire domain, which
can reduce the inference discrepancy caused by missing covariances near subdomain interfaces.
The capability of the proposed method is demonstrated in three field inversion problems with
increasing complexity, i.e., the diffusion problem, the scalar transport problem and the Reynolds
averaged Navier-Stokes closure problem. The numerical results show that the proposed method
can significantly improve computational efficiency with satisfactory inference accuracy.

1. Introduction

Field inversion aims to infer unknown spatial-varying parameters of physical systems from available observation [1,2], which is
frequently encountered in computational physics applications. Take a heated rod with nonuniform thermal diffusivity as an example.
The diffusivity distribution along the rod is much more difficult to measure compared to temperature. This fact motivates the
field inversion problem of inferring the latent field, i.e., the thermal diffusivity, from the observation data, i.e., the temperature
distribution.
Field inversion problems amount to finding optimal unknown physical quantities that minimize the misfit between model pre-
dictions and observation data. The inverse problem involves high-dimensional spatial-varying parameters, which is different from
typical parameter estimation problems where the inferred parameter is low-dimensional [3]. Both the adjoint [4] and ensemble-
based [5] methods have been developed to tackle the field inversion problems. The adjoint-based approaches can guide inference

* Corresponding author.
E-mail address: hgw@lnm.imech.ac.cn (G. He).

https://doi.org/10.1016/j.jcp.2024.113059
Received 18 September 2023; Received in revised form 20 March 2024; Accepted 25 April 2024
Available online 3 May 2024
0021-9991/© 2024 Elsevier Inc. All rights reserved.
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

processes with model derivatives by solving certain adjoint equations [6]. However, adjoint solvers are often not readily available,
particularly for legacy code, which requires extra effort in code re-developments and maintenance. In contrast, the ensemble Kalman
method [5,7] is a derivative-free statistical inference method, circumventing the development of the adjoint solvers. This method
uses sample statistics to implicitly approximate the model gradient [8,9]. Moreover, it can achieve a second-order optimization with
good inference efficiency by introducing the low-rank approximated Hessian information [10]. The ensemble Kalman method has
been successfully applied in different field inversion problems, such as inferring the Reynolds stress tensor field [11], eddy viscosity
field [12,13], and spatial characteristics of jet noise source [14] from mean flow observations. However, these works mainly focus
on the scenario of having sparse observation data, while the ensemble-based field inversion for handling high dimensional data still
lacks investigation.
Field inversion with large amounts of observation data is often encountered in practical applications of fluid dynamics. For
instance, particle image velocimetry experiments can produce large amounts of time-dependent observation data in velocity. Using
such large datasets for field inversion would significantly increase the computational costs of the ensemble Kalman inversion [15,16].
That is because the update scheme typically involves the inversion of a full-rank matrix with the rank as large as the observation
dimension, which leads to the computational complexity cubically increasing as the data amount. Therefore, it is necessary to develop
efficient implementation strategies for the ensemble Kalman method.
Efficient implementation of ensemble-based approaches has been explored in different ways for handling large datasets. For in-
stance, the Sherman-Morrison-Woodbury formula can be applied to compute the Kalman gain matrix in the ensemble-based update
scheme [17]. This matrix manipulation can reduce the computational complexity to 𝑂(𝐷) where 𝐷 represents the data amount, in
contrast to the conventional implementation that has the complexity of 𝑂(𝐷3 ). Also, multi-level techniques [18,19] can be used to
improve the efficiency of the ensemble Kalman method. This method leverages sample statistics on coarse-level grids to update flow
fields at high-level of resolution, which has been applied for data assimilation of unsteady flows [20] and large-scale inverse prob-
lems [21]. Another widely used approach is to apply the parallel strategy based on domain decomposition [22]. The parallelization
approach can reduce not only the computational complexity but also the required memory size for matrix storage. Besides, computa-
tional fluid dynamics (CFD) solvers often run in parallel by dividing the computation domain into several subdomains. This feature
promotes using the parallel ensemble method for field inversion in CFD applications. Specifically, the computational domain is de-
composed into multiple subdomains, and the ensemble Kalman update is performed at each subdomain with individual processes
without inter-process communication. By doing so, both the computational complexity and memory consumption can be signifi-
cantly reduced to the level of the local subdomain. Therefore, the present work focuses on parallel strategies for ensemble-based field
inversion.
Various parallel strategies can enable the ensemble Kalman method to solve large-scale field inversion problems. For instance,
the local ensemble Kalman method [23] is proposed to perform ensemble Kalman analysis at each region in parallel. Such parallel
implementation of the local ensemble Kalman method can also be modified based on Cholesky decomposition to further improve
the inference efficiency [24]. These parallel ensemble Kalman methods have been successfully used in solving data assimilation
problems, i.e., inferring global state fields of physical systems from partial observation [25]. However, they often employ non-
overlapping domain decomposition and neglect the covariance of variables between the subdomains, which can significantly affect
the accuracy of the inferred field. One can improve the inference accuracy by communicating the information near the interface of
subdomains, but it would pose extra difficulties in searching for adjacent mesh cells, particularly in unstructured meshes. In view of
these difficulties, it is desired to develop a parallel ensemble Kalman inversion with improved computational efficiency, satisfactory
inversion accuracy, and ease of implementation simultaneously.
In this work we propose a novel parallel strategy of the ensemble Kalman method for solving field inversion problems with large
amounts of observation data. The parallelization strategy based on non-overlapping domain decomposition [26] is used to improve
the efficiency of ensemble-based Kalman updates. Further, the total variation regularization is employed to alleviate the effects of
the missing covariance information among subdomains on the inference accuracy. The approach parallelizes the Kalman analysis
scheme without requiring inter-subdomain communication for ease of implementation. Moreover, the total variation regularization
is achieved through one algorithmic modification on the conventional ensemble Kalman method [27], which does not require much
effort in code development. On the other hand, the regularization step is not associated with observation data, which does not
significantly increase the computational complexity of the ensemble method in scenarios having large data amounts.
The rest of the paper is structured as follows. The methodology of the ensemble Kalman method and its parallel implementation
is described in Section 2. The case details and numerical results are illustrated in Section 3. The conclusion is provided in Section 4.

2. Methodology

Consider a physical system that has an unknown quantity 𝗑 ∈ ℝ𝑁 (e.g., diffusivity in the heat equation) and available observa-
tion 𝗒 ∈ ℝ𝐷 (e.g., temperature). The unknown quantity 𝗑 obeys the Gaussian process (𝗑0 , ), where 𝗑0 is the initial guessed value
and  is the prescribed model error covariance. The observation model can be formulated as

𝗒 = [𝗑+ ] + 𝜖 , (1)
where  indicates the model operator that maps the reference quantities 𝗑+ to the observation space. Here 𝜖 is added observation
noise which is assumed to be an independent and identically distributed Gaussian random vector with zero mean and covariance 𝖱.
The field inversion problem amounts to inferring unknown spatial-varying quantity 𝗑 from available observation data 𝗒. It can be
converted to a minimization problem, and the corresponding cost function to be minimized is written as

2
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

min 𝐽 = ‖𝗒 − [𝗑]‖ . (2)


Such minimization problems often involve solving differential equations to obtain [𝗑]. It typically needs to solve certain adjoint
equations to estimate the gradient of the cost function 𝐽 with respect to the unknown quantity 𝗑. Alternatively, the ensemble-
based method [7,28,29,27] can be also used for solving the inverse problems, which is a derivative-free method without requiring
developing the adjoint solvers. In this work, we adopt the ensemble Kalman method to tackle the field inversion problem.

2.1. Ensemble Kalman method for field inversion

The ensemble Kalman inversion (hereafter referred to as EnKI) method [7,30,31] is a statistical inference method based on the
Monte Carlo sampling technique. The statistics of random samples are used to estimate the sensitivity of the cost function with
respect to the model parameters, circumventing efforts in developing the adjoint solvers. For the conventional ensemble Kalman
method, the corresponding cost function is formulated as
‖ ‖2 ‖ ‖2
min 𝐽 = ‖𝗒𝑗 − [𝗑𝑖+1 𝑖+1 𝑖
𝑗 ]‖ + ‖𝗑𝑗 − 𝗑𝑗 ‖ , (3)
‖ ‖𝖱 ‖ ‖𝖯
where 𝑖 is the updating iteration index, 𝑗 is the sample index, the norm ‖ ⋅ ‖2𝖠 is defined as ‖𝜈‖2𝖠 = 𝝂 ⊤ 𝖠−1 𝝂 for a vector 𝝂 and weight
matrix 𝖠, and 𝖯 and 𝖱 are the model and observation error covariance, respectively. The observation data 𝗒 is resampled at each
iteration to avoid underestimating posterior uncertainties.
The Kalman update scheme can be derived by minimizing the cost function 𝐽 (3). It is formulated as [8]

𝗑𝑖+1 𝑖 ⊤ ⊤ −1 𝑖
𝑗 = 𝗑𝑗 + 𝖯𝖧 (𝖧𝖯𝖧 + 𝖱) (𝗒𝑗 − 𝖧𝗑𝑗 ), (4)
where 𝖧 is the local gradient of the observation operator  with respect to the inferred quantity 𝗑. In practical implementation, the
tangent linear operator 𝖧 does not need to be computed by reformulating

𝖯𝖧⊤ = 𝖲𝑥 𝖲⊤
𝑦 and 𝖧𝖯𝖧⊤ = 𝖲𝑦 𝖲⊤
𝑦. (5)
The square root matrix 𝖲𝑥 and 𝖲𝑦 are estimated with the ensemble of samples at each iteration as
[ ]
1 𝑖 𝑖 𝑖
𝖲𝑖𝑥 = √ 𝗑𝑖1 − 𝗑 , 𝗑𝑖2 − 𝗑 , ⋯ , 𝗑𝑖𝑁 − 𝗑 , (6a)
𝑒
𝑁𝑒 − 1
[ ]
1
𝖲𝑖𝑦 = √ [𝗑𝑖1 ] − [𝗑𝑖 ], [𝗑𝑖2 ] − [𝗑𝑖 ], ⋯ , [𝗑𝑖𝑁 ] − [𝗑𝑖 ] , (6b)
𝑒
𝑁𝑒 − 1
𝑁𝑒
𝑖 1 ∑ 𝑖
𝗑 = 𝗑, (6c)
𝑁𝑒 𝑗=1 𝑗

where 𝑁𝑒 is the ensemble size. As such, the term 𝖯𝖧⊤ represents the covariance between the uncertain quantity 𝗑 and the model
prediction 𝖧𝗑, while the term 𝖧𝖯𝖧⊤ represents the variance of model predictions 𝖧𝗑.
The update scheme of the ensemble Kalman method has the computational complexity of approximately 𝑂(𝐷3 ) [17] for cases of
having large data amounts. The details of each matrix manipulation in the ensemble Kalman method are presented in Appendix A.
Therefore, the computational complexity increases significantly as the amount of observation data. For this reason, parallel imple-
mentation is needed to reduce the complexity based on domain decomposition. The conventional and the proposed novel parallel
strategies are presented in the following.

2.2. Parallel ensemble Kalman method for field inversion

In order to improve the inference efficiency, the analysis scheme of the ensemble Kalman method can be implemented parallelly
based on domain decomposition [23,22]. Specifically, the computational domain Ω is decomposed into multiple non-overlapping
subdomains, e.g., Ω[1] , ⋯ , Ω[𝑘] , ⋯ , Ω[𝑀] , where the superscript 𝑀 indicates maximum number of the subdomains and 𝑘 is index
of subdomains. At each subdomain, the local quantities 𝗑[𝑘] are updated with the ensemble Kalman method based on the local
observation 𝗒[𝑘] .
The update scheme of the parallel ensemble Kalman method is similar to Eq. (4) but is performed locally at each subdomain. It
can be formulated as

𝗑𝑖+1
[𝑘]
= 𝗑𝑖[𝑘] + 𝖯[𝑘] 𝖧⊤ ⊤ 𝑖
[𝑘] (𝖧[𝑘] 𝖯[𝑘] 𝖧[𝑘] + 𝖱[𝑘] ) (𝗒[𝑘] − 𝖧[𝑘] 𝗑[𝑘] ).
−1
(7)
In practical implementation, the scheme can be also reformulated in the form of the square root matrix as in Eq. (6). After the Kalman
analysis, the updated quantities {𝗑[𝑘] }𝑀
𝑘=1
at each subdomain are assembled to form the global field 𝗑. The ensemble Kalman method
can be updated at a very low computational complexity of 𝑂(𝑑 3 ), where 𝑑 is the dimension of observation data at the subdomain.
For convenience, the discretized domain is partitioned evenly with the same number of mesh grids in this work. The Kalman update
at each domain is independent without mutual communication, leading to the proposed method being notably straightforward
to implement. We note that the matrix inversion in Eq. (7) can be also conducted with banding Cholesky decomposition [24] to

3
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

further reduce the computational cost. The update scheme in Eq. (7) is specific for using local pointwise observation, e.g., velocity.
Such observation data is often available from particle image velocimetry (PIV) experiments in practical applications. Moreover,
the pointwise observation can have large amounts, which increases the computational complexity of Kalman analysis significantly
and necessitates the parallel implementation of the update algorithm. For global observation such as the lift force of an airfoil, the
parallel ensemble Kalman inversion method needs to be modified accordingly by using the global observation for field inversion at
each subdomain.
The parallel ensemble Kalman inversion (hereafter referred to as PEnKI) method can reduce the computational cost significantly
and has been used for solving data assimilation problems [22]. However, it would lead to loss of fidelity for the inferred fields
due to missing covariance information among the subdomains. For instance, the term 𝖯𝖧⊤ in the Kalman gain matrix represents
covariance between the unknown quantity 𝗑 and model prediction [𝗑]. The parallel implementation ignores the covariance of
quantities between the local and adjacent subdomains, which can result in large discrepancies in the inferred field. Hence, it is
worthy of further investigation to improve the accuracy of the parallel ensemble Kalman method.

2.3. Parallel ensemble Kalman method with total variation regularization

Here we propose a novel parallel ensemble Kalman inversion method with total variation regularization to constrain the inferred
field. It is achieved by penalizing the difference of adjacent cells on the entire domain. The total variation regularization can
improve the accuracy of the inferred field from two aspects. On the one hand, the missing covariance among subdomains can lead to
nonsmoothness of the inferred quantities near the interface between subdomains. Such discrepancies can be reduced with the total
variation regularization by promoting the smoothness of the inferred field. On the other hand, the high-dimension inverse problem
is often ill-posed [11,14], i.e., different fields 𝗑 can provide similar model prediction 𝖧𝗑 in good agreement with observation data.
This can lead to local discrepancies in the region where model prediction is insensitive to the inferred quantity 𝗑. Adding the total
variation regularization can alleviate the ill-posedness issue by penalizing local gradients of the inferred field [13]. The details of the
proposed ensemble update scheme are illustrated in the following.
The cost function with the total variation regularization can be formulated as
‖ ‖2 ‖ 𝑖+1 ‖2
𝐽 = ‖𝗒𝑗 − 𝖧𝗑𝑖+1
𝑗 ‖ + ‖𝗑 − 𝗑𝑖𝑗 ‖ + ‖𝖦[𝗑𝑖𝑗 ]‖2𝖶 , (8)
‖ ‖𝖱 ‖ 𝑗 ‖𝖯
where the third term denotes total variation regularization with 𝖦[𝗑] = 𝗑 − 𝗑adj , the superscript adj indicates the adjacent cell, and
𝖶 is the covariance matrix for adjusting the strength of the total variation regularization. The covariance matrix 𝖶 is computed by
following the conventional regularized ensemble Kalman method [27]. Specifically, it is formulated as
𝜒 ̄ −1
𝖶−1 = 𝖶 (9)
‖𝖯‖𝐹
where ‖𝖯‖𝐹 is the Frobenius norm of model covariance matrix 𝖯, and 𝜒 is the regularization parameter. The matrix 𝖶 ̄ is given as
the identity matrix in this work. Using the formula (9), the magnitude of 𝖶 is dynamically adjusted based on ‖𝖯‖𝐹 , and only the
direction of the covariance matrix 𝖯 is preserved, which can overcome the detrimental effects of sample collapse on the total variation
regularization. The regularization parameter 𝜒 is modeled as a ramp function since a large value would ignore the observation data
at initial steps and lead to inference divergence. In this work, we follow the regularized ensemble method [27] and formulate the
parameter 𝜒 as
( ( ) )
𝑖−5
𝜒(𝑖) = 0.5 tanh +1 . (10)
2
The update scheme can be derived by minimizing the cost function, and the details of the derivation can be found in Ref. [27]. The
update scheme of the proposed method with total variation regularization can be written as

𝗑̃ 𝑗 = 𝗑𝑖𝑗 − 𝖯𝖦⊤ 𝖶−1 [𝗑𝑖𝑗 ]𝖦[𝗑𝑖𝑗 ];


(11)
𝗑𝑖+1
𝑗 =𝗑̃ 𝑖𝑗 + 𝖪(𝗒𝑗 − 𝖧̃𝗑𝑖𝑗 ) with 𝖪 = 𝖯𝖧⊤ (𝖧𝖯𝖧⊤ + 𝖱)−1 .
The ensemble Kalman method with total variation regularization involves two update steps: one for total variation regularization
and the other one for Kalman correction. The first step employs a pre-correction to smoothen the entire inferred field, particularly
at the region with low model sensitivity and near the subdomain interfaces. The second step performs the Kalman update to mini-
mize the discrepancies between the model prediction and the observation data. The tangent linear operator 𝖦 can be obtained by
computing the finite difference operator [13]. Alternatively, its computation can be avoided as the term 𝖯𝖧⊤ by reformulating

𝖯𝖦⊤ = 𝖲𝑥 𝖲⊤
𝑔. (12)
The square root matrix 𝖲𝑔 is estimated from the ensemble of samples at each iteration as
[ ]
1
𝖲𝑖𝑔 = √ 𝖦[𝗑𝑖1 ] − 𝖦[𝗑𝑖 ], 𝖦[𝗑𝑖2 ] − 𝖦[𝗑𝑖 ], ⋯ , 𝖦[𝗑𝑖𝑁 ] − 𝖦[𝗑𝑖 ] . (13)
𝑒
𝑁𝑒 − 1
We further parallelly implement the ensemble Kalman method with total variation regularization. The first step in Eq. (11) is
implemented without using parallelization and performed globally to promote the smoothness of the entire field to be inferred. The

4
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Table 1
Comparison of different ensemble-based inversion methods, includ-
ing the EnKI, PEnKI, and PEnKI-TV methods, in updating schemes
and computational complexity.

method update scheme complexity

EnKI 𝗑𝑖+1 = 𝗑𝑖 + 𝖪(𝗒 − 𝖧𝗑𝑖 ) 𝑂(𝐷3 )

PEnKI 𝗑𝑖+1
[𝑘]
= 𝗑𝑖[𝑘] + 𝖪[𝑘] (𝗒[𝑘] − 𝖧[𝑘] 𝗑𝑖[𝑘] ) 𝑂(𝑑 3 )
𝗑𝑖+1 = {𝗑𝑖+1 }𝑀
[𝑘] 𝑘=1
𝗑̃ 𝑖 = 𝗑𝑖 − 𝖯𝖦⊤ 𝖶−1 𝖦[𝗑𝑖 ]
PEnKI-TV 𝗑𝑖+1
[𝑘]
= 𝗑̃ 𝑖[𝑘] + 𝖪[𝑘] (𝗒[𝑘] − 𝖧[𝑘] 𝗑̃ 𝑖[𝑘] ) 𝑂(𝑑 3 + 𝑁𝑁𝑒2 )
𝗑𝑖+1 = {𝗑𝑖+1 }𝑀
[𝑘] 𝑘=1

second step in Eq. (11) is similar to the conventional EnKI scheme except using the regularized field 𝗑̃ . We parallelize the second
step as the local ensemble Kalman method [23] to accelerate computational efficiency. As such, the scheme of the second step can
be reformulated as

𝗑𝑖+1
[𝑘]
= 𝗑̃ 𝑖[𝑘] + 𝖪[𝑘] (𝗒[𝑘] − 𝖧[𝑘] 𝗑̃ 𝑖[𝑘] ). (14)

This step is performed to update local quantities 𝗑[𝑘] at each subdomain based on observation data 𝗒[𝑘] . In summary, the first step is
applied to smoothen the global field based on total variation regularization, which alleviates the issues of ill-posedness and missing
covariance among subdomains. Further, the second step updates the local quantity with the ensemble Kalman method in parallel to
accelerate the efficiency of incorporating observation data.
The computational complexity of the proposed parallel ensemble Kalman inversion with total variation regularization (hereafter
referred to as PEnKI-TV) is reduced significantly compared to the conventional EnKI method. Specifically, the inversion scheme
involves two update steps. The computational complexity of the first update step is approximately 𝑂(𝑁𝑁𝑒2 ) as illustrated in Ap-
pendix A, while that of the second step is 𝑂(𝑑 3 ). The total complexity is still significantly lower than that of the conventional EnKI
method, i.e., 𝑂(𝐷3 ), for scenarios having large amounts of observation data. Compared to the PEnKI method, the proposed PEnKI-
TV method has an additional regularization step, which can improve the inference accuracy at a low computational cost. One can
also perform the total variation regularization locally around the interface between subdomains. It can further reduce the computa-
tional complexity but poses difficulties in searching for mesh cells near the interface of subdomains. The complexity of the proposed
ensemble Kalman scheme is summarized in Table 1 with comparison to conventional update schemes.
Note that the proposed method uses the first-order total variation to measure the field smoothness. High-order total variation
can be also used by involving multiple neighboring cells, which would enforce smoother physical fields than the first-order total
variation regularization. The high-order total variation regularization [32] is often achieved by combining the first-order and high-
order total variations with adaptive functions. Only using high-order total variations has difficulties in capturing sharp interfaces,
e.g., near the shock wave or the edge of a backward step, leading to over-smoothed fields. On the contrary, only using first-order
total variation would not be able to smoothen the field sufficiently or lead to stairs [32], i.e., piecewise constant region. Combining
different total variations can remedy these deficiencies, and the adaptive function is used to lessen the action of the high-order term
where the first-order total variation is large. However, the cost function for high-order total variation regularization is different from
the present method which includes an additional adaptive regularization term, and the update scheme needs to be reformulated.
Moreover, the high-order total variation would pose extra difficulties in finding neighboring cells near the subdomain interfaces in
practical applications, particularly for unstructured meshes. Therefore, the high-order total variation regularization is not used in the
present work.
High-performance computing (HPC) strategies such as the Schur complement and the message passing interface (MPI) can be used
to reduce memory storage and improve computational efficiency for the conventional EnKI method. The proposed method has one
noticeable advantage with domain localization, in contrast to the EnKI method empowered by the HPC techniques. Specifically, the
EnKI method often encounters spurious correlation issues due to the low-rank covariance approximation with the limited samples.
That is, the limited samples may provide an incorrect covariance matrix that has strong correlations among spatially far-distant
points. The parallel EnKI methods perform local Kalman analysis based on the domain decomposition, which updates the local
latent field with observations at neighboring cells and eliminates the effects of the spurious correlations. Such domain localization
is of critical importance for inferring large-scale fields with limited samples [33], while it is not achieved by the EnKI method
empowered with the HPC strategies. Also, we note that the HPC strategy can be used to further improve the computational efficiency
of the present method in both local and global analysis. There would be an efficiency bottleneck of the proposed method for
a distributed parallel environment using MPI and ScaLAPACK, which is inherited from the conventional parallel implementation
of ensemble Kalman method [17]. In the parallel environment, the ensemble matrix can be naturally distributed so that each
process owns a block of columns. However, such 1D process grids tend to be particularly inefficient, because ScaLAPACK requires
square process grids for best performance. Therefore, the ensemble matrix needs to be redistributed before the matrix linear algebra
operations.

5
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 1. Schematic of the parallel ensemble Kalman method with total variation regularization. The parallel implementation includes two steps: the global total
variation (TV) regularization as highlighted in the red/gray box and the local Kalman analysis in the black box. (For interpretation of the colors in the figure(s), the
reader is referred to the web version of this article.)

2.4. Procedure

The procedure of the parallel ensemble Kalman method with total variation regularization is illustrated in this section. Given
the observation data 𝗒, the initial field 𝗑0 , and the kernel function , the paralleled ensemble Kalman method with total variation
regularization can be performed based on the following steps.

(a) Initial sampling: the realizations of unknown field 𝗑 are drawn based on initial guess 𝗑0 and kernel covariance . The Kahunen-
𝑁𝑒
Loève (KL) decomposition [34] is used to generate random samples of unknown fields {𝗑𝑗 }𝑗=1 .
(b) Model forecast: for each sample, the field 𝗑𝑗 is propagated to the observation space 𝖧𝗑𝑗 by solving the governing equation of
physical systems.
(c) Global regularization based on total variation: the entire field to be inferred is smoothened based on the total variation regu-
larization as in Eq. (11).
(d) Domain decomposition: the regularized field 𝗑̃ is decomposed into non-overlapping subdomains as {̃𝗑[𝑘] }𝑀 𝑘=1
.
(e) Local Kalman correction: at each subdomain, the Kalman correction is used to infer the local quantities 𝗑a[𝑘] from observation
data 𝗒[𝑘] based on Eq. (14).
(f) Assembling subfields: the updated subfields {𝗑a[𝑘] }𝑀
𝑘=1
are assembled into the global field 𝗑a .

Return to step
√ (b) until the convergence criteria or maximum iteration number is reached. The convergence criteria are set as
‖[𝗑] − 𝗒‖ < trace(𝖱) based on discrepancy principle [35]. The maximum iteration number is set as 500 in this work. The schematic
of the procedure is illustrated in Fig. 1.

6
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Table 2
Summary of the case setup in the inferred quantity, observation, data amount, and the number of the subdo-
mains.

case index inferred quantities observed quantities data amount No. of subdomains

Case 1 diffusivity 𝐷 temperature 𝑇 103 [1, 64]


Case 2 velocity 𝑈 concentration 𝜃 [1.2 × 104 , 1.2 × 105 ] 8
Case 3 eddy viscosity 𝜈𝑡 velocity 𝑈 ≈ 1.5 × 104 6

3. Results

We use three flow applications with increasing complexity to demonstrate the capability of the proposed parallel ensemble
Kalman method. Specifically, the three cases are the diffusion problem, scalar transport problem, and Reynolds-averaged Navier-
Stokes closure problem. They represent different scenarios where field inversion is used. The first case represents inferring material
property, i.e., diffusivity, from available observation data, i.e., temperature. The second case is to infer background flow velocity
from concentration observation. This is often encountered in velocity reconstruction based on the trajectory of passive particles
within the flows. The final case aims to infer Reynolds stresses or underlying model correction from velocity measurements, which
is of great interest for data-driven turbulence modeling [11,13].
The complexity of the three cases is gradually increased from the perspective of field inversion. The first case is a one-dimensional
field inversion problem with stationary observation data. In this case, the effect of the domain decomposition on the inference
efficiency and accuracy is investigated by varying the number of subdomains. The second case is a two-dimensional field inversion
with observation data at different times. In this case, we verify the capacity of the proposed method to handle different observation
amounts. Also, the domain decomposition is performed only along the horizontal direction. In contrast, the third case is 2D field
inversion problems but with domain decomposition along horizontal and vertical directions. Such domain decomposition is often
encountered in CFD applications for parallel computation. Moreover, the RANS equation has more severe nonlinearity than the
former two cases, which increases the difficulty of the field inversion. The summary of the test cases is presented in Table 2.
The criteria for domain splitting are provided based on three principles. The first principle is balanced computational load. That is,
each subdomain should have equal numbers of mesh grids to share a similar computational burden. Further, the subdomains should
have as few interfaces as possible, which reduces the effects of missing covariance information near the interfaces on the inference
accuracy. Besides, the size of the subdomain should be larger than the correlation length of latent fields. Different domain-splitting
strategies that violate these criteria would significantly affect the inversion efficiency and accuracy. For instance, splitting into
subdomains with different numbers of mesh cells would reduce the inversion efficiency due to the unbalanced computational burden.
Splitting the domain with extra subdomain interfaces would increase inversion errors due to missing covariance information near
the additional interfaces. Furthermore, too small subdomain sizes when less than correlation length scales would induce significant
inversion errors as the covariance information is severely lost.
The open-source library OpenFOAM [36] is used for solving governing equations of the 2D scalar transport problem and the
RANS closure problem. The DAFI library [37] is used to implement the ensemble-based Kalman method. All the numerical tests in
this work are conducted on a workstation with 64 processes and a RAM memory of 64 gigabytes.

3.1. 1D diffusion problem

The first case is the 1D diffusion problem where the unknown spatial-varying parameter is the diffusivity and observation data
is the output temperature field. We aim to infer the diffusivity field from the temperature observation. For this 1D field inversion
case, we divide the computational domain into different subdomains ranging from 2 to 64. In this work, the number of processes
is consistent with the number of subdomains so that local Kalman analysis within each subdomain can be performed on different
processes independently. In doing so, we can test the parallel efficiency and inversion accuracy of the ensemble Kalman method with
different numbers of processes.
The 1D diffusion equation can be formulated as
( )
𝑑 𝑑𝑇
− 𝐷[𝑥] = 𝑓 [𝑥], (15)
𝑑𝑥 𝑑𝑥
where 𝑥 is the spatial coordinate, 𝑇 is the quantity being diffused, e.g., temperature, 𝑓 [𝑥] is the heat source, and 𝐷[𝑥] is the unknown
diffusivity field. The heat source 𝑓 [𝑥] is prescribed as 𝑓 [𝑥] = sin(0.2𝜋𝑥∕𝐿), where 𝐿 is the length of the domain. The homogeneous
boundary condition is employed as 𝑇 |𝑥=0 = 𝑇 |𝑥=𝐿 = 0. This test case aims to infer the spatial distribution of diffusivity 𝐷 from the
observation data of the temperature 𝑇 .

3.1.1. Case details


The KL decomposition [34] is used to generate initial samples. Specifically, the field is constructed as the linear combination of
KL modes 𝜙 and the coefficients 𝜔. It can be formulated as
𝑁

log(𝐷∕𝐷0 ) = 𝜔 𝑖 𝜙𝑖 , (16)
𝑖=1

7
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 2. Plots of prior diffusivity and temperature fields with the EnKI method, the PEnKI method, and the PEnKI-TV method compared to the baseline and the truth
for the diffusion case.

where 𝐷0 = 1 is used as the baseline value, and 𝜙 is the weighted eigenvector of the model error covariance . The logarithm is to
ensure the non-negativity of the diffusivity. The error covariance  for two spatial points 𝑥 and 𝑥′ is written as
( )
‖𝑥 − 𝑥′ ‖2
 = 𝜎𝑝2 exp − , (17)
𝑙𝑠2
where 𝜎𝑝 is variance and 𝑙𝑠 is correlation length. In this case, the variance 𝜎𝑝 and the length scale 𝑙𝑠 are set as 0.5 and 0.02,
respectively. The reconstructed diffusivity with 𝜔1 = 1 and 𝜔𝑖 = 0 for 𝑖 > 1 is used as the synthetic truth, and the propagated
temperature is used as observation data. The spatial domain is discretized as 103 cells. We test the number of processes ranging from
2 to 64 for the parallel ensemble Kalman methods. The relative observation error is set as 0.0001, and the ensemble size is 100 in
this case based on our sensitivity study as presented in Appendix B.
The prior samples are plotted in Fig. 2 with comparison to the baseline and truth. It can be seen that the prior sample mean has
noticeable discrepancies from the synthetic truth, while the range spanned by the initial samples can cover the true diffusivity and
temperature fields. The sample mean has good agreement with the baseline since the samples are drawn from the prior distribution
with the baseline as the mean. However, there are slight discrepancies between the baseline and the sample mean likely due to the
sampling errors.

3.1.2. Results
The inferred diffusivity with the proposed PEnKI-TV method exhibits the best agreement with the synthetic truth. The results of
the inferred diffusivity are shown in Fig. 3 with a comparison among the EnKI, PEnKI, PEnKI-TV methods, and the synthetic truth.
The domain is decomposed into 32 subdomains in the present results. The predicted temperature field has no significant differences
among the three methods, and all have good agreement with the ground truth. Hence the plots of temperature results are omitted
for brevity. Generally, the EnKI and PEnKI-TV methods can provide better inversion results of diffusivity compared to the PEnKI
method. The PEnKI method leads to noticeable discontinuity near the subdomain interfaces, while the EnKI and proposed PEnKI-TV
methods can achieve a good agreement with the ground truth. Particularly, the PEnKI-TV method can alleviate the discontinuity
near the interface between subdomains with the total variation regularization.
It is also noted that the inferred diffusivity with the baseline EnKI method has slight discrepancies around 𝑥∕𝐿 = 0.6. This is
likely due to the local diminishing temperature gradient near 𝑥∕𝐿 = 0.6 as shown in Fig. 2, which leads to the insensitivity of the
temperature to the variation of the diffusivity field. The degree of the ill-posedness of the inverse problems can be evaluated based
on operator analysis [38–40], while it is out of the scope of the present work and worthy of further investigation.
We decompose the computational domain into different subdomains and employ the parallel ensemble Kalman method with the
corresponding number of processes. The inference errors with different inversion methods are presented in Fig. 4(a). Specifically, the
inference error with the ensemble Kalman inversion method is about 0.53%. The PEnKI method generally leads to larger inference
errors compared to the EnKI method, except for 8 and 16 processes.
For the cases with 8 and 16 processes, the EnKI method leads to relatively poor inversion results compared to the PEnKI method.
That is likely because the EnKI method encounters the spurious correlation issue [33] that leads to strong correlations among
spatially far-distant points and further induces inference errors. The parallel EnKI method is able to alleviate the effects of the
spurious correlation and improve the inversion accuracy with domain localization. It is achieved by updating the local field with
observation at corresponding subdomains. The proposed PEnKI-TV method can further reduce the inference error and provide more
accurate inference than the baseline EnKI method, except for using the 64 processes. The better inference is due to the fact that the
total variation regularization alleviates the ill-posedness issue as shown in Fig. 3. Specifically, for the cases with 2 and 4 processes,
the inference error can be caused by both spurious correlation and ill-posedness, which can be alleviated by the domain localization
and regularization, respectively, in the PEnKI-TV method. With 8 and 16 processes, the inference error seems mainly caused by the
spurious correlation and is reduced significantly with the domain localization in both parallel methods. With 32 and 64 processes,

8
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 3. Plots of inferred diffusivity fields with the EnKI method, the PEnKI method, and the PEnKI-TV method compared to the ground truth for the diffusion case.

Fig. 4. Inference errors and computational cost with different processes for the 1D diffusion case. The wall time shows the computational time for one inversion step.

the inversion error with the parallel methods becomes pronounced due to the severely lost covariance information, particularly for
64 processes, as the size of the subdomain is less than the prescribed correlation length 𝑙𝑠 .
As for the computation cost, the parallel implementation can significantly decrease the wall time for field inversion compared to
the EnKI method. The wall time of the EnKI method, the PEnKI method, and the PEnKI-TV method with different numbers of processes
is shown in Fig. 4(b). The computational efficiency of the PEnKI method is similar to the PEnKI-TV method for this case. It can be
clearly seen that the parallel ensemble Kalman methods can accelerate the training speed by more than 10 times with 64 processes,
compared to the EnKI method. Specifically, the conventional ensemble Kalman method uses only one process, and the computational
cost of one inversion step requires 48 seconds. In contrast, the proposed parallel EnKI with total variation regularization can reduce
the wall time to 4 seconds with 64 processes. The wall time of the PEnKI method with 4 processes is only around one-third of the
EnKI results. That may be caused by the extra computational time consumed for matrix dividing and resembling, which affects the
parallel efficiency. The inversion efficiency of the EnKI method can be improved by empowering the multi-processing capabilities.
However, the inversion accuracy would be similar to the present results with the EnKI method. That is, the EnKI results would be the
upper bound of the HPC-empowered EnKI method. In contrast, the PEnKI method applies localized Kalman analysis based on domain
decomposition, which improves not only the inversion efficiency but also the accuracy by alleviating the effects of the spurious
correlation with domain localization.

3.2. 2D scalar transport problem

The second test case is an unsteady 2D field inversion problem about the advection-diffusion processes. Here we show the
capability of the proposed method in handling unsteady cases with different amounts of observation data. The 2D scalar transport
equation can be formulated as
𝑑𝜃
+ ∇ ⋅ (𝑼 𝜃) = ∇ ⋅ (𝐷∇𝜃), (18)
𝑑𝑡
where 𝑡 is time, 𝜃 is the concentration field, 𝑼 is stationary background velocity to be inferred, and 𝐷 is the diffusivity which is taken
as constant 0.01 in this case. The concentration field 𝜃 is transferred by the stationary background velocity 𝑼 due to the advection
and diffusion processes. The goal is to infer the background velocity 𝑼 from observed concentration field 𝜃 at different times 𝑡. This
case represents field inversion problems in the unsteady scenario, which can have large amounts of time-dependent observation data.

9
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 5. Schematic of domain decomposition for the scalar transport case.

Fig. 6. Plots of prior velocity and concentration fields with comparison among samples, baseline, and the synthetic truth for the scalar transport case.

3.2.1. Case details


For this case we employ the flow over a backward-facing step as shown in Fig. 5, which has been widely used for numerical
validation in CFD applications. The computational domain is discretized with 12000 cells and decomposed to be 8 subdomains in this
case. The schematic of the domain decomposition is presented in Fig. 5. Specifically, the inlet and convergent parts of the channel
are regarded as individual subdomains. Further, the middle channel section is decomposed into 6 subdomains along the horizontal
direction with an equal number of mesh cells.
The initial samples are drawn based on the KL expansion [41]. The velocity field is formulated as
𝑁

𝑼 = 𝑼0 + 𝜔 𝑖 𝜙𝑖 , (19)
𝑖=1

where 𝑼 0 is the baseline velocity, and 𝜙 is the eigenvector of the kernel function  for KL decomposition. The kernel function 
can be written as
( )
‖𝑥 − 𝑥′ ‖2 ‖𝑦 − 𝑦′ ‖2
 = 𝜎𝑝2 exp − − (20)
𝑙𝑥2 𝑙𝑦2
where 𝑙𝑥 and 𝑙𝑦 are the streamwise and vertical correlation lengths, respectively. We set the correlation length as 𝐻 and 0.2𝐻 for 𝑥
and 𝑦-axis directions, respectively, where 𝐻 is the height of the step. The standard deviation 𝜎𝑝 is set as 1.0 in this case.
The prediction results from the 𝑘–𝜔 model are used as the baseline 𝑼 0 . The synthetic truth is constructed based on the KL modes 𝜙
with given mode coefficients 𝜔, i.e., 𝜔1 = 2 and 𝜔𝑖 = 0 for 𝑖 > 1. The concentration 𝜃 transferred by the true velocity field is used
as the observation data. We test the observation of concentration 𝜃 with different number of time slices in the range of 𝑡∕𝑇 ∈ [0, 2],
where 𝑇 is the flow-through time. The initial samples are constructed with random coefficients 𝜔 drawn from the standard normal
distribution. The relative observation error is set as 0.001, and 50 samples are used in this case based on our sensitivity study as
presented in Appendix B.
The prior plots are presented in Fig. 6 with comparison to the baseline and the truth. It can be seen that there exist noticeable
discrepancies between the baseline and the ground truth. The mean of samples has a good agreement with the baseline, which is
reasonable as the samples are drawn from the probability distribution with the baseline as the mean. For the propagated temperature,
the sample mean has slight discrepancies from the baseline likely due to the uncertainty propagation.
The initial samples do not compass the truth mainly due to the large discrepancies in the baseline model prediction. One can
increase the standard deviation 𝜎𝑝 , particularly in regions having large inversion errors, to enlarge the coverage of the samples [11].
However, the baseline errors in the inferred field are not known a priori. Also, large standard deviations may lead to samples
with nonphysical values, such as negative mainstream velocity in this case, and further solver divergence. Hence, without loss of
generality, the uniform standard deviation is used in all test cases of this work.

3.2.2. Results
The contour plots of the inferred velocity are provided in Fig. 7 with three different time slices of 𝑡∕𝑇 = 0.33, 1, 1.67. It can be
seen that the baseline velocity 𝑼 0 , particularly the vertical velocity 𝑈𝑦 , has large discrepancies from the truth in the magnitude. The
conventional ensemble Kalman method can infer accurately the streamwise velocity field but lead to a noisy field in the wall-normal

10
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 7. Contour plots of velocity inferred with different ensemble methods with comparison to the baseline and truth for the advection-diffusion problems.

velocity 𝑈𝑦 . Such noises are mainly due to the ill-posedness of the inverse problems. That is, the small difference in the velocity does
not make much difference in the model prediction of concentration fields. The inferred velocity fields with the PEnKI method also
have the noise issue. Moreover, this method leads to noticeable sharp interfaces between the subdomains, e.g., Ω1 and Ω2 , due to
the missing covariance information. In contrast, the proposed PEnKI-TV method can improve the velocity inference by smoothening
the global field. On the one hand, the proposed method employs total variation regularization to suppress the discontinuity near the
interface between subdomains. On the other hand, it also alleviates the ill-posedness issues and denoises the inferred wall-normal
velocity.
To clearly show the improvement of the inference, we compare the velocity results along profiles with different ensemble methods.
The reconstructed concentration fields with the EnKI method, the PEnKI method, and the PEnKI-TV method are similar and hence
the plots are omitted for brevity. The velocity plots along profiles are shown in Fig. 8. The conventional EnKI method can infer the
streamwise velocity accurately, while the reconstructed wall-normal velocity has noticeable noises near the center of the channel
due to the ill-posedness of the inverse problem. That is, such velocity variations make no significant difference to the concentration
field 𝜃 . As for the PEnKI method, the inferred results lead to large discrepancies near the bottom wall for the streamwise velocity.
Moreover, the wall-normal velocity also exhibits nonsmoothness due to the ill-posedness issue as well as the missing covariance
near the subdomain interfaces. In contrast, the proposed PEnKI-TV method can provide wall-normal velocity fields with remarkable
accuracy and smoothness.
Both domain localization and regularization can reduce the inversion noise in the vertical velocity. It is evident in Fig. 7, which
shows that both the PEnKI and PEnKI-TV methods suppress the noise in the inferred vertical velocity, compared to the EnKI method.
The PEnKI method can suppress the inference noise with domain localization, while the PEnKI-TV method leverages both domain
localization and regularization techniques. We compute the total variation of the inferred velocity fields with the EnKI, PEnKI, and
PEnKI-TV methods to show the contribution of domain localization and regularization in promoting field smoothness for this case.
Based on our results, the EnKI method leads to the total variation of streamwise velocity to be 91.27 and vertical velocity to be 25.13.
The PEnKI method increases the total variation of streamwise velocity to 110.32 while reducing the total variation of vertical velocity
to 17.18. The reduction of the total variation in the vertical velocity is likely due to the domain localization which eliminates the
effects of spurious correlation. The increase in the total variation of the streamwise velocity field is caused by the missing covariance
information near the subdomain interface. In contrast, the PEnKI-TV method reduces total variations of both the streamwise velocity
and vertical velocity to 95.13 and 13.51, respectively. Therefore, the regularization can alleviate the effects of the lost covariance

11
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 8. Plots of inferred background velocity along profiles with different ensemble methods compared to the baseline and truth for the scalar transport case.

information on the total variation of the streamwise velocity field and further suppress the inference noise in the vertical velocity,
compared to the PEnKI method.
We investigate the effects of data amounts on the accuracy and efficiency of different ensemble methods, i.e., the EnKI method,
the PEnKI method, and the proposed PEnKI-TV method. Different data amounts are investigated, including 36000, 60000, 96000, and
120000 that correspond to 3, 5, 8, and 10 time slices evenly distributed from 𝑡 = 0 to 2𝑇 . The computational time used for the field
inversion is summarized in Table 3 with a comparison among the EnKI method, the PEnKI method, and the PEnKI-TV method. It can
be seen that both the PEnKI method and the PEnKI-TV method can accelerate the training speed by more than 10 times. Specifically,
for the case with 36000 data, the EnKI method requires 20800 seconds for one inversion step, while the PEnKI and PEnKI-TV
methods need 340 and 380 seconds, respectively. For further larger data amounts, the EnKI method runs out of RAM memory, while
the parallel ensemble methods can still infer the velocity fields accurately. The analysis speed of the PEnKI-TV method is slightly
slower than the PEnKI method due to the additional global regularization step.
The inference accuracy of different ensemble methods is also provided in Table 3. It can be seen that the EnKI method can
reconstruct well velocity and concentration fields. The parallel ensemble Kalman method significantly increases the inference ef-
ficiency but leads to relatively large discrepancies in both velocity and concentration. In contrast, the proposed ensemble Kalman
method with total variation regularization can improve the accuracy of the reconstructed velocity and concentration at a similar
computational cost.

12
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Table 3
Summary of inference accuracy and computation cost with the
EnKI method, the PEnKI method, and the PEnKI-TV method for
the scalar transport case.

dim(y) Method error(𝑈 ) error(𝜃 ) Wall time

36000 EnKI 1.03% 1.22% 20800 s


PEnKI 3.17% 4.91% 340 s
PEnKI-TV 2.03% 4.33% 380 s

60000 EnKI – – –
PEnKI 2.80% 4.35% 1300 s
PEnKI-TV 1.66% 4.13% 1400 s

96000 EnKI – – –
PEnKI 6.88% 6.18% 4800 s
PEnKI-TV 1.11% 3.41% 5000 s

120000 EnKI – – –
PEnKI 6.39% 6.46% 9600 s
PEnKI-TV 1.06% 3.20% 9800 s

Fig. 9. Schematic of domain decomposition for the RANS closure problems.

3.3. RANS closure problem

In the final case, we test the ensemble Kalman methods for field inversion of practical interest in turbulence modeling, i.e., the
RANS closure problem. The RANS method is widely used in various fluid engineering applications, but its accuracy highly depends
on the modeling of the Reynolds stress. Therefore, it is of significant interest to infer the Reynolds stress from available observations
such as velocity. Here we consider the linear eddy viscosity model to estimate the Reynolds stress. As such, the scalar field, i.e., eddy
viscosity, needs to be inferred, instead of the Reynolds stress tensor field. The RANS equations can be written as

∇⋅𝑼 =0
(21)
𝑼 ⋅ ∇𝑼 = −∇𝑝 + (𝜈 + 𝜈t )∇2 𝑼 ,
where 𝑼 is velocity, 𝑝 is a pseudo pressure term, 𝜈 is the fluid viscosity, and 𝜈t is the eddy viscosity field to be inferred. This case
aims to infer the eddy viscosity from the velocity data in the scenario of having severe model nonlinearity.

3.3.1. Case details


The flows over periodic hills are used as the test case, which is a representative case for evaluating closure models in separated
flows. Periodic boundary conditions are imposed at the inlet and outlet. The domain is discretized with 149 cells in the stream-wise
direction and 99 cells in the wall-normal direction. The dimensionless wall distance 𝑦+ of the first cell is small enough to avoid
involving wall functions. The Reynolds number based on the height of the hill crest 𝐻 and the inlet velocity 𝑈𝑏 is 2800.
The computational domain is decomposed into 6 subdomains as shown in Fig. 9. In this case, the domain is decomposed along
the horizontal and vertical directions, and each subdomain has equal mesh cells and sufficient sizes larger than the correlation length
scale. This is in contrast to the scalar transport case where the domain is decomposed only along the horizontal direction, showing
the capability of the method to alleviate the effects of the missing covariance with such a domain-splitting strategy. Moreover, this
domain decomposition is often used in CFD applications with parallel computing. It poses challenges for the field inversion with the
parallel ensemble Kalman method since the covariance information along both directions is distorted.
The predictions with the 𝑘-𝜔 model [42] are taken as the baseline results. The truth is obtained from RANS prediction with a
different turbulence model, i.e., 𝑘–𝜔 SST model [43]. The reason for choosing the 𝑘–𝜔 model as the baseline model is that the model
formula is similar to the true model, i.e., 𝑘-𝜔 SST model. The main difference between the two models lies in that the 𝑘–𝜔 SST model
will transform to 𝑘–𝜀 model in the freestream out of the boundary layer by modifying the transport equation of specific dissipation
rate. Hence, the discrepancies between the baseline eddy viscosity and the truth are mainly located away from the wall, where the
velocity prediction is insensitive to the eddy viscosity, which would necessitate the use of regularization in improving inference
accuracy. The initial samples are generated based on the KL expansion as

13
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 10. Plots of prior eddy viscosity and velocity along profiles with comparison among samples, baseline, and truth for the periodic hill case.

Fig. 11. Plots of velocity fields with comparison among baseline, the ground truth, and the inferred results of the EnKI, PEnKI, and PEnKI-TV methods for the RANS
closure case.

𝑀

log(𝜈𝑡 ∕𝜈0 ) = 𝜔𝑖 Φ𝑖 , (22)
𝑖=0

where the logarithm is to ensure the non-negativity of the eddy viscosity. The formulation of the KL decomposition is consistent
with the scalar transport case. The relative sample variance 𝜎𝑝 in the kernel function  is set as 1.0, and the correlation length is
0.2𝐻 , where 𝐻 is the height of the crest. The relative observation error is set as 0.0001, which represents significant beliefs on the
observation data. 50 samples are used in this case based on our sensitivity study as presented in Appendix B.
The prior plots of the eddy viscosity and the velocity are presented in Fig. 10. It can be seen that the sample mean can have
a good fit with the baseline similar to the former two cases. On the hill, particularly with favorable pressure gradients, the range
spanned by the samples is not able to cover the synthetic truth. One can choose better baseline models or increase the standard
deviation 𝜎𝑝 in the region having large inference discrepancies to enlarge the coverage of the samples. However, it is not practical
as the inference errors are often not known a priori. Also, we note that the inferred field is within the subspace linearly spanned
by the samples [7], while not constrained by the initial sample coverage in the physical space. The posterior samples can deviate
much from the initial sample coverage due to the repeated usage of observation data, which is also observed in literature [12]. An
initial ensemble that covers the truth in the physical space could improve the inference efficiency and accuracy but is not a necessity.
The propagated velocity has a relatively small variation which can cover the truth. The sample mean also agrees with the baseline
velocity and deviates from the truth mainly near the bottom wall.

3.3.2. Results
All the inversion methods, including the EnKI, PEnKI, and PEnKI-TV, can improve the velocity reconstruction significantly. It can
be observed from the contour plots of velocity as shown in Fig. 11. The baseline underestimates the separation bubble size compared
to the synthetic truth. In contrast, all the ensemble methods can noticeably improve the estimate of the separation bubble.
As for the inferred eddy viscosity, the proposed PEnKI-TV method can achieve considerable accuracy with the EnKI method. It
is supported by the contour plots of the inferred eddy viscosity as presented in Fig. 12. It can be seen that the baseline generally
overestimates the eddy viscosity compared to the truth. The conventional EnKI method can provide the inferred eddy viscosity in
good agreement with the truth in most areas, except near the throat. The PEnKI method leads to inferior results compared to the EnKI

14
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 12. Contour plots of eddy viscosity fields with comparison among baseline, ground truth, and the inferred results with EnKI, PEnKI, and PEnKI-TV methods for
the periodic hill case.

Table 4
Summary of the errors and computational costs
with the three ensemble-based inversion meth-
ods for the RANS closure case.

Method EnKI PEnKI PEnKI-TV

error(𝑈 ) 1.71% 5.65% 2.35%


error(𝜈𝑡 ) 29.5% 52.0% 34.3%
Wall time 300 s 30 s 33 s

method, which exhibits significant discontinuity near the interface between subdomains. In contrast, the PEnKI-TV method provides
better agreement with the truth and alleviates the discontinuity issues by penalizing the spatial gradient of the eddy viscosity field.
To clearly show the inference improvement, the inferred eddy viscosity along profiles is provided among the EnKI method, the
PEnKI method, and the PEnKI-TV method, with comparison to the baseline and truth. The reconstructed velocity fields with the three
methods are in good agreement with the true velocity, and hence the plots are omitted for brevity. The plots of the eddy viscosity
with the three methods along profiles are provided in Fig. 13. It can be seen that the inferred eddy viscosity with the EnKI method
has remarkable agreements with the truth, although there exist slight differences, mainly in the regions with non-zero pressure
gradients. The large discrepancies on the hill are likely due to the limited subspace spanned by the samples. The evidence is shown
in the prior plots in Fig. 10 where the truth is far from the range spanned with the initial samples at the positions 𝑥∕𝐻 = 0 and 8.
The ensemble method can search for the optimal solution within the subspace spanned with the initial samples, which leads to the
noticeable large discrepancy on the hill. It is noted that such limitation is inherited from the conventional EnKI method, which needs
to be further investigated but beyond the scope of the present work. The PEnKI method can also improve the accuracy of the inferred
eddy viscosity compared to the baseline results. However, it leads to significant discrepancies near the interfaces of sub-domains
due to the missing covariance information. In contrast, the proposed PEnKI-TV method can achieve considerable accuracy as the
conventional EnKI method. Compared to the PEnKI method, the proposed method can improve the accuracy of eddy viscosity near
the subdomain interfaces by penalizing the total variation of the entire field.
The parallel implementation can significantly reduce the computational cost compared to the conventional EnKI method. The
computational time used for the field inversion is shown in Table 4. It can be seen that the parallel ensemble method can accelerate
the training speed by 10 times. Specifically, the EnKI method needs wall times of 300 seconds for one inference step. The PEnKI
method and the PEnKI-TV method can reduce the computational costs to around 30 and 33 seconds, respectively. The extra time is
consumed for the additional regularization step in the proposed PEnKI-TV method.

4. Conclusion

In this work we propose a parallel implementation of the ensemble Kalman method with the total variation regularization
for large-scale field inversion problems. The non-overlapping domain decomposition is employed to divide the entire field to be
inferred into multiple subdomains. Further, the local ensemble Kalman update is performed at each subdomain to incorporate
regional observation data parallelly. The total variation regularization is used to alleviate the missing covariance information near the
subdomain interfaces. The combination of the domain decomposition and the total variation regularization is able to accelerate the
inference convergence and ensure the accuracy of the inferred field simultaneously. Three numerical tests with increasing complexity
are used to demonstrate the efficiency and accuracy of the proposed method, including the diffusion problem, the scalar transport
problem, and the RANS closure problem. All the results show that the proposed parallel ensemble Kalman method with total variation
regularization can significantly reduce the computational cost compared to the ensemble Kalman method. Moreover, the method is

15
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Fig. 13. Plots of eddy viscosity along profiles with comparison among baseline, ground truth, and the inferred results with the EnKI, PEnKI, and PEnKI-TV methods
for the periodic hill case.

able to improve the accuracy of the inferred fields compared to the conventional parallel ensemble Kalman method by enforcing the
total variation regularization.

CRediT authorship contribution statement

Xin-Lei Zhang: Conceptualization, Investigation, Methodology, Software, Validation, Writing – original draft, Writing – review
& editing. Lei Zhang: Conceptualization, Investigation, Methodology. Guowei He: Conceptualization, Funding acquisition, Supervi-
sion, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgements

The authors are supported by the NSFC Basic Science Center Program for “Multiscale Problems in Nonlinear Mechanics” (No.
11988102). XLZ also acknowledges support from the National Natural Science Foundation of China (No. 12102435), the China Post-
doctoral Science Foundation (No. 2021M690154), and the Young Elite Scientists Sponsorship Program by CAST (No. 2022QNRC001).
The authors would like to thank the reviewers for their constructive and valuable comments, which helped improve the quality and
clarity of this manuscript.

Appendix A. Computational complexity

The computational complexity of the ensemble Kalman method is provided in this section. The complexity of each matrix ma-
nipulation in the update scheme is presented in Table A.5. The matrix inversion in the EnKI method leads to the computational
complexity of 𝑂(𝐷3 ) in scenarios of large observation data amount, i.e., 𝐷 ≫ 𝑁𝑒 . This poses the difficulty of the ensemble method

16
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

Table A.5
Computational complexity of update schemes in conventional ensemble
Kalman method.

matrix manipulation computational complexity

𝖧𝖯𝖧⊤ = 1
𝑁𝑒 −1
𝖧Δ𝖷(𝖧𝖷)⊤ 𝑂(𝐷2 𝑁𝑒 )
(𝖧𝖯𝖧⊤ + 𝖱) −1
𝑂(𝐷3 )
𝖠 = (𝖧𝖯𝖧⊤ + 𝖱)−𝟣 (𝗒 − 𝖧𝗑) 𝑂(𝐷2 𝑁𝑒 )
𝖡 = (𝖧𝗑)⊤ 𝖠 𝑂(𝐷𝑁𝑒2 )
Δ𝖷𝖡 𝑂(𝑁𝑁𝑒2 )
𝛿𝗑 = 𝖷(𝖧𝗑)⊤ (𝖧𝖯𝖧⊤ + 𝖱)−𝟣 (𝗒 − 𝖧𝗑) 𝑂(𝐷3 + 𝐷2 𝑁𝑒 + 𝐷𝑁𝑒2 + 𝑁𝑁𝑒2 )

Table A.6
Computational complexity of the regularization step in the
proposed PEnKI-TV method.

matrix manipulation computational complexity

𝖠 = 𝖶−1 𝖦[𝗑] 𝑂(𝑁𝑁𝑒 )


𝖡 = (Δ𝖦[𝖷])⊤ 𝖠 𝑂(𝑁𝑁𝑒2 )
Δ𝖷𝖡 𝑂(𝑁𝑁𝑒2 )
𝛿 𝗑̃ = Δ𝖷(𝖦[𝖷])⊤ 𝖶−𝟣 𝖦[𝗑] 𝑂(𝑁𝑁𝑒2 + 𝑁𝑁𝑒 )

Table B.7
Summary of inference accuracy and computation cost with dif-
ferent sample sizes for each test case, including diffusion prob-
lem (Case 1), scalar transport problem (Case 2), and RANS clo-
sure problem (Case 3).

Case index sample size inference error Wall time

Case 1 20 1.22% 38 s
50 0.91% 40 s
80 0.73% 46 s
100 0.53% 48 s

Case 2 20 8.53% 20600 s


50 1.22% 20800 s
80 1.18% 21500 s
100 1.15% 22500 s

Case 3 20 32.22% 280 s


50 29.5% 300 s
80 28.8% 430 s
100 28.3% 480 s

in handling large observation data sets. The parallel ensemble Kalman inversion method can reduce complexity to the level of the
local subdomain, i.e., 𝑂(𝑑 3 ), by partitioning the observation data.
The proposed PEnKI-TV method combines domain decomposition and total variation regularization to improve the efficiency and
accuracy of the inference process. The update scheme has one additional regularization step compared to the PEnKI method, while
this step does not significantly increase the computational complexity. The complexity of the regularization step in the proposed
PEnKI-TV method is presented in Table A.6. It can be seen that the step has the computational complexity of 𝑂(𝑁𝑁𝑒2 ) which is
independent of the data amount.

Appendix B. Sensitivity study of ensemble size

We investigate the effects of sample sizes on the inversion accuracy in this section. Different sample sizes including 20, 50, 80,
and 100 are tested for each case with the EnKI method. The inference results and computational costs are presented in Table B.7.
For all the cases, the inference error can be reduced by increasing the sample size, while the wall time of the inference process is
also increased accordingly. For the diffusion case, i.e., Case 1, the inference error reduces from 1.22% to 0.53% by increasing the
sample size from 20 to 100. The wall time does not vary much since the computational cost of the model forecast is negligible.
Hence, we choose the sample size of 100 for the diffusion case. As for the scalar transport problem and the RANS closure problem,
the inference error can also be reduced by increasing the sample size, while the computational time would be increased noticeably.
Particularly, as the sample size is larger than the maximum process number of the used workstation, i.e., 64, the computational time
is significantly increased. This is likely due to the increased computational cost in CFD simulations as the sample number is larger
than the maximum process number. For this reason, we choose a sample size of 50 for the last two cases to have satisfactory inference
accuracy and efficiency.

17
X.-L. Zhang, L. Zhang and G. He Journal of Computational Physics 509 (2024) 113059

References

[1] E.J. Parish, K. Duraisamy, A paradigm for data-driven predictive modeling using field inversion and machine learning, J. Comput. Phys. 305 (2016) 758–774.
[2] A.P. Singh, K. Duraisamy, Using field inversion to quantify functional errors in turbulence closures, Phys. Fluids 28 (4) (2016) 045110.
[3] H. Kato, S. Obayashi, Approach for uncertainty of turbulence modeling based on data assimilation technique, Comput. Fluids 85 (2013) 2–7.
[4] M.B. Giles, N.A. Pierce, An introduction to the adjoint approach to design, Flow Turbul. Combust. 65 (3–4) (2000) 393–415.
[5] G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, 2009.
[6] C.A. Michelén Ströfer, H. Xiao, End-to-end differentiable learning of turbulence models from indirect observations, Theor. Appl. Mech. Lett. (2021) 100280,
https://doi.org/10.1016/j.taml.2021.100280.
[7] M.A. Iglesias, K.J. Law, A.M. Stuart, Ensemble Kalman methods for inverse problems, Inverse Probl. 29 (4) (2013) 045001, https://doi.org/10.1088/0266-5611/
29/4/045001.
[8] X.-L. Zhang, H. Xiao, T. Gomez, O. Coutier-Delgosha, Evaluation of ensemble methods for quantifying uncertainties in steady-state CFD applications with small
ensemble sizes, Comput. Fluids (2020) 104530, https://doi.org/10.1016/j.compfluid.2020.104530.
[9] X.-L. Zhang, H. Xiao, X. Luo, G. He, Ensemble Kalman method for learning turbulence models from indirect observation data, J. Fluid Mech. 949 (2022) A26.
[10] X. Luo, A.S. Stordal, R.J. Lorentzen, G. Nævdal, Iterative ensemble smoother as an approximate solution to a regularized minimum-average-cost problem: theory
and applications, SPE J. 20 (05) (2015) 962–982.
[11] H. Xiao, J.-L. Wu, J.-X. Wang, R. Sun, C. Roy, Quantifying and reducing model-form uncertainties in Reynolds-averaged Navier–Stokes simulations: a data-driven,
physics-informed Bayesian approach, J. Comput. Phys. 324 (2016) 115–136, https://doi.org/10.1016/j.jcp.2016.07.038.
[12] X.-L. Zhang, H. Xiao, G. He, S. Wang, Assimilation of disparate data for enhanced reconstruction of turbulent mean flows, Comput. Fluids (2021) 104962,
https://doi.org/10.1016/j.compfluid.2021.104962.
[13] X.-L. Zhang, H. Xiao, G. He, Assessment of regularized ensemble Kalman method for inversion of turbulence quantity fields, AIAA J. 60 (1) (2022) 3–13.
[14] X.-L. Zhang, H. Xiao, T. Wu, G. He, Acoustic inversion for uncertainty reduction in Reynolds-averaged Navier–Stokes-based jet noise prediction, AIAA J. 60 (4)
(2022) 2407–2422.
[15] C. Colburn, J. Cessna, T. Bewley, State estimation in wall-bounded flow systems. Part 3. The ensemble Kalman filter, J. Fluid Mech. 682 (2011) 289–303.
[16] L. Villanueva, M.M. Valero, A.S. Glumac, M. Meldi, Augmented state estimation of urban settings using intrusive sequential data assimilation, arXiv preprint,
arXiv:2301.11195.
[17] J. Mandel, Efficient Implementation of the Ensemble Kalman Filter, University of Colorado at Denver and Health Sciences Center, 2006, p. 231.
[18] H. Hoel, K.J. Law, R. Tempone, Multilevel ensemble Kalman filtering, SIAM J. Numer. Anal. 54 (3) (2016) 1813–1839.
[19] G. Moldovan, G. Lehnasch, L. Cordier, M. Meldi, A multigrid/ensemble Kalman filter strategy for assimilation of unsteady flows, J. Comput. Phys. 443 (2021)
110481.
[20] G. Moldovan, G. Lehnasch, L. Cordier, M. Meldi, Optimized parametric inference for the inner loop of the multigrid ensemble Kalman filter, J. Comput. Phys.
471 (2022) 111621.
[21] H. Gao, J.-X. Wang, A bi-fidelity ensemble Kalman method for PDE-constrained inverse problems in computational mechanics, Comput. Mech. 67 (2021)
1115–1131.
[22] P. Houtekamer, B. He, H.L. Mitchell, Parallel implementation of an ensemble Kalman filter, Mon. Weather Rev. 142 (3) (2014) 1163–1182.
[23] E. Ott, B.R. Hunt, I. Szunyogh, A.V. Zimin, E.J. Kostelich, M. Corazza, E. Kalnay, D. Patil, J.A. Yorke, A local ensemble Kalman filter for atmospheric data
assimilation, Tellus, Ser. A Dyn. Meteorol. Oceanogr. 56 (5) (2004) 415–428.
[24] E.D. Nino-Ruiz, A. Sandu, X. Deng, A parallel implementation of the ensemble Kalman filter based on modified Cholesky decomposition, J. Comput. Sci. 36
(2019) 100654.
[25] L. Sun, J.-X. Wang, Physics-constrained Bayesian neural network for fluid flow reconstruction with sparse and noisy data, Theor. Appl. Mech. Lett. 10 (3) (2020)
161–169.
[26] M. Hintermüller, A. Langer, Non-overlapping domain decomposition methods for dual total variation based image denoising, J. Sci. Comput. 62 (2) (2015)
456–481.
[27] X.-L. Zhang, C. Michelén-Ströfer, H. Xiao, Regularized ensemble Kalman methods for inverse problems, J. Comput. Phys. 416 (2020) 109517, https://doi.org/
10.1016/j.jcp.2020.109517.
[28] M.A. Iglesias, A regularizing iterative ensemble Kalman method for PDE-constrained inverse problems, Inverse Probl. 32 (2) (2016) 025002.
[29] G. Evensen, Analysis of iterative ensemble smoothers for solving inverse problems, Comput. Geosci. 22 (3) (2018) 885–908, https://doi.org/10.1007/s10596-
018-9731-y.
[30] N.B. Kovachki, A.M. Stuart, Ensemble Kalman inversion: a derivative-free technique for machine learning tasks, Inverse Probl. 35 (9) (2019) 095005.
[31] T. Schneider, A.M. Stuart, J.-L. Wu, Ensemble Kalman inversion for sparse learning of dynamical systems from time-averaged data, J. Comput. Phys. 470 (2022)
111559.
[32] T. Chan, A. Marquina, P. Mulet, High-order total variation-based image restoration, SIAM J. Sci. Comput. 22 (2) (2000) 503–516.
[33] G. Evensen, F.C. Vossepoel, P.J. van Leeuwen, Data Assimilation Fundamentals: A Unified Formulation of the State and Parameter Estimation Problem, Springer
Nature, 2022.
[34] O.P. Le Maître, O.M. Knio, Spectral Methods for Uncertainty Quantification: with Applications to Computational Fluid Dynamics, Springer, 2010.
[35] C. Schillings, A.M. Stuart, Convergence analysis of ensemble Kalman inversion: the linear, noisy case, Appl. Anal. 97 (1) (2018) 107–123.
[36] OpenCFD, OpenFOAM user guide, see also http://www.opencfd.co.uk/openfoam, 2018.
[37] C.A. Michelén Ströfer, X.-L. Zhang, H. Xiao, DAFI: an open-source framework for ensemble-based data assimilation and field inversion, Commun. Comput. Phys.
29 (5) (2021) 1583–1622.
[38] B. Hofmann, On the degree of ill-posedness for nonlinear problems, J. Inverse Ill-Posed Probl. 2 (1) (1994) 61–76, https://doi.org/10.1515/jiip.1994.2.1.61.
[39] S.I. Kabanikhin, Definitions and examples of inverse and ill-posed problems, J. Inverse Ill-Posed Probl. 16 (2008) 317–357, https://doi.org/10.1515/JIIP.2008.
019.
[40] B. Hofmann, S. Kindermann, On the degree of ill-posedness for linear problems with noncompact operators, Methods Appl. Anal. 17 (4) (2010) 445–462.
[41] C.A. Michelén Ströfer, X.-L. Zhang, H. Xiao, O. Coutier-Delgosha, Enforcing boundary conditions on physical fields in Bayesian inversion, Comput. Methods
Appl. Mech. Eng. 367 (2020) 113097, https://doi.org/10.1016/j.cma.2020.113097.
[42] D.C. Wilcox, Turbulence Modeling for CFD, 3rd edition, DCW Industries, 2006.
[43] F.R. Menter, Two-equation eddy-viscosity turbulence models for engineering applications, AIAA J. 32 (8) (1994) 1598–1605, https://doi.org/10.2514/3.12149.

18

You might also like