Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

WATER RESOURCES RESEARCH, VOL. 48, W05522, doi:10.

1029/2011WR011778, 2012

Efficient methods for large-scale linear inversion using


a geostatistical approach
Arvind K. Saibaba1 and Peter K. Kitanidis1,2
Received 20 December 2011; revised 16 March 2012; accepted 23 March 2012; published 16 May 2012.
[1] In geophysical inverse problems, such as estimating the unknown parameter field
from noisy observations of dependent quantities, e.g., hydraulic conductivity from head
observations, stochastic Bayesian and geostatistical approaches are frequently used. To
obtain best estimates and conditional realizations it is required to perform several matrix-
matrix computations involving the covariance matrix of the discretized field of the
parameters. In realistic three-dimensional fields that are finely discretized, these operations
as performed in conventional algorithms become extremely expensive and even prohibitive
in terms of memory and computational requirements. Using Hierarchical Matrices, we show
how to reduce the complexity of forming approximate matrix-vector products involving the
Covariance matrices in log linear complexity for an arbitrary distribution of points and a
wide variety of generalized covariance functions. The resulting system of equations is
solved iteratively using a matrix-free Krylov subspace approach. Furthermore, we show
how to generate unconditional realizations using an approximation to the square root of the
covariance matrix using Chebyshev matrix polynomials and use the above to generate
conditional realizations. We demonstrate the efficiency of our method on a few standard test
problems, such as interpolation from noisy observations and contaminant source
identification.
Citation: Saibaba, A. K., and P. K. Kitanidis (2012), Efficient methods for large-scale linear inversion using a geostatistical approach,
Water Resour. Res., 48, W05522, doi:10.1029/2011WR011778.

1. Introduction attainable data involves algebraically underdetermined


[2] The characterization or imaging of the subsurface is inverse problems.
required for the efficient discovery and utilization of natu- [3] A general method for solving such problems is
ral resources, the removal of contaminants, and the moni- known as geostatistical, see for example [Kitanidis, 1995].
toring and management of engineered subsurface systems The approach is based on the idea of combining data with
such as geothermal plants, nuclear-waste repositories, and information about the structure of the function that needs to
CO2 sequestration sites. In particular, in hydrogeologic and be estimated. In Bayesian and geostatistical approaches
environmental applications, we are interested in imaging, (for example, see discussion in the work of Kitanidis
among other properties, the hydraulic conductivity and po- [2007, 2010]), the structure of the function is represented
rosity of geologic formations, the soil moisture content in through the prior probability density function, which in
partially saturated zones, the concentration of dissolved practical applications is often parameterized through vario-
chemicals, and the location of nonaqueous liquid phases. In grams and generalized covariance functions. The method
practice, subsurface imaging remains a problem fraught has found several applications because it is generally prac-
with difficulties. Some measurement techniques, such as tical and quantifies uncertainty. The method can generate
wells tests or well logging [Batu, 1998], are expensive to best estimates, which can be determined in a Bayesian
obtain and provide information with limited area coverage. framework as a posteriori mean values or most probable
Imaging between wells introduces interpolation error. values, measures of uncertainty as posterior variances or
Other measurements such as surface or cross-well geophy- credibility intervals, and conditional realizations, which are
sics [Rubin and Hubbard, 2005] and hydraulic tomography sample functions from the ensemble of the posterior proba-
[Cardiff et al., 2009] provide good areal coverage but have bility distribution.
low resolution, i.e., cannot identify small-scale features. [4] The major challenges in the large-scale implementa-
Thus, the imaging of the subsurface from practically tion of the geostatistical approaches are as follows:
[5] 1. The number of unknowns increase, the storage and
computational costs involving the dense covariance matrix
1
Institute for Computational and Mathematical Engineering, Jen-Hsun are overwhelming, especially on unstructured grids.
Huang Engineering Center, Stanford University, Stanford, California, USA. [6] 2. Certain problems with large number of measure-
2
Department of Civil and Environmental Engineering, Stanford Univer-
sity, Stanford, California, USA.
ments, the measurement operator cannot only be dense,
forming it explicitly would require the repeated solution of
Copyright 2012 by the American Geophysical Union (possibly) time-dependent partial differential equations. In
0043-1397/12/2011WR011778 such cases, the prevailing approach to form matrix-matrix

W05522 1 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

products with the covariance matrix (even using fast sum- [11] Our major contributions are as follows. This paper
mation schemes described below) proves to be extremely describes methods for the efficient computational imple-
expensive. mentation of the geostatistical approach when measure-
[7] 3. Uncertainty quantification by generating condi- ments are collected in both space and time. We focus on
tional realizations, an important step is generating uncondi- the linear inversion case, and for cases for which the prior
tional realizations. General procedures such as Cholesky can be described as Gaussian. We will demonstrate that, for
factorization are extremely expensive and special treatment large-dimensional problems, these methods can reduce the
is required. computational burden by orders of magnitude compared to
[8] In this paper, we will deal sequentially with these direct or ‘‘naive’’ implementations that are common in
issues. The covariance matrices although dense, have spe- applications. Our algorithm heavily uses the Hierarchical
cial structure which can be exploited. They are similar to matrix approach and, thus, is not limited to regular equi-
dense matrices that arise from the discretization of integral spaced grids unlike FFT based algorithms. We use a matrix-
equations. Various fast summation schemes have been free iterative solver to compute the maximum-a-posteriori
devised to provide matrix-vector products in OðN log  N Þ estimate and in order to keep the number of iterations small,
for such problems, where N is the number of unknowns and we have designed an efficient preconditioner that relies on a
  0 is some integer, depending on the chosen approach. low-rank representation of the covariance matrix, whose
They broadly fall under the following categories: (1) Tree eigenvalues are known to decay rapidly. An important aspect
Codes like Barnes-Hut algorithm [Barnes and Hut, 1986], of the geostatistical approach is to be able to quantify the
(2) Fast Multipole Methods [Greengard and Rokhlin, uncertainty of the unknown field and we perform this by
1987; Ying et al., 2004] and (3) Hierarchical matrices and generating conditional realization from the posterior distri-
Adaptive Cross Approximation [Bebendorf, 2000; Börm bution of the unknowns. We apply our method to interpola-
et al., 2003; Grasedyck and Hackbusch, 2003; Rjasanow tion from noisy observations and the problem of determining
and Steinbach, 2007]. the initial contaminant distribution from the time-history of
[9] Previously, for regular equispaced grids, the Toeplitz measurements from spatially distributed sensors. This prob-
or block-Toeplitz structure of the covariance matrix has lem has been discussed extensively in the following referen-
been exploited to accelerate matrix-vector products to ces [Akcelik et al., 2003, 2005; Flath et al., 2011; Michalak
OðN log NÞ, where N is the number of unknowns. In partic- and Kitanidis, 2003].
ular, Nowak and Cirpka [2006] utilized this method to esti- [12] The rest of this article is organized as follows.
mate hydraulic conductivity and dispersivities in a large- Section 2 describes the Geostatistical approach to linear
scale problem. However, they cannot easily be extended to inverse problem. In section 3.1 we describe in detail the
handle nonuniform grids. In the work of Fritz et al. [2009], Hierarchical matrix approach to compute fast matrix-vector
they extended the FFT based algorithm to deal with irregu- products involving the dense Covariance matrix in a black-
larly spaced measurements but they did not show how box fashion and section 3 describes utilizes these fast
to extend their algorithm for general measurement opera- matrix-vector products along with a Krylov subspace solver
tors or the case when the underlying unknowns are not on a to solve the linear inverse problem. We also discuss the
regular grid. An algorithm to deal with irregular grids was implementation details regarding the preconditioner and a
developed in the work of Li and Cirpka [2006], that summary of the operation cost of our algorithm. Further-
exploited the Karhunen-Loève expansion to represent the more, in section 4.1, an algorithm for computing approxi-
spatial random field corresponding to a stationary covari- mate conditional realizations based on Chebyshev matrix
ance function. They concluded that when the covariance polynomials utilizing the fast matrix-vector products
function is smooth, with a large correlation length, the described in section 3.1. Finally, we demonstrate the per-
number of terms required in the expansion will be small, so formance of our algorithm in section 5.
this method would be efficient compared to the FFT based
approach on regular grids. However, they still suffer from 2. Problem Formulation
the same defect as the FFT based methods, namely, when [13] For the sake of completeness, we will review here
the measurement operator is dense and the number of the problem formulation and the prevalent solution
measurements are high, forming matrix-matrix products approach. Consider a function sðxÞ, such as log conductiv-
with the covariance matrices are expensive. ity, to be estimated. Focusing on a prevalent approach, the
[10] Recently [e.g., Rjasanow and Steinbach, 2007; function is represented as follows
Bebendorf, 2008], Fast Multipole Methods and Hierarchi-
cal matrices have gained a lot of popularity in computing X
p

fast matrix-vector products for certain class of dense matri- sðxÞ ¼ fk ðxÞk þ ðxÞ; (1)
k¼1
ces. Hierarchical matrices appear ideally suited to our cur-
rent situation because they are applicable to a wide variety where typically, fk 2 dp , the space of pth degree polynomials
of kernels, without much reimplementation, since they han- in d variables,  are unknown coefficients and the second
dle most operations algebraically. We choose to use the term is a random function with zero mean and character-
Hierarchical matrix approach over the kernel independent ized by a covariance function. After discretization (e.g.,
versions of the Fast Multipole Method because they are using finite differences, or finite element models), sðxÞ is
relatively easier to implement. They can also be combined represented by the vector s 2 Rm , from a set of noisy meas-
effectively with iterative solvers like GMRES [Saad and urements y 2 Rn
Schultz, 1986], to solve linear systems involving these
dense matrices. y ¼ Hs þ v v  N ð0; RÞ; (2)

2 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

where H 2 Rnm is called the measurement operator and v as well. The posterior mean values, ^s and ^ is given by the
is an random n-vector corresponding to measurement error, maximum a posteriori estimate, which is equivalent to solv-
with mean zero and covariance matrix R. We also have, ing a weighted least squares optimization problems
 
E½s ¼ X E½ðs  XÞðs  XÞT  ¼ Q; (3) 1 1
arg mins; ks  XkQ1 þ ky  HskR1 : (8)
2 2
where X is m  p drift matrix, whose columns are com-
puted as Xij ¼ fi ðxj Þ and  are p unknown drift coefficients. [16] For n < m, it is more convenient to compute the so-
The entries of the covariance matrix Q, are given by lution to this optimization problem (8) by first obtaining
Qij ¼ Kðxi ; xj Þ, where Kð; Þ is a generalized covariance the solution of the following linear system of equations,
function which must be conditionally positive definite. For ! ! !
a more detailed discussion on covariance kernels that per- HQHT þ R HX ^ y
missible, we refer the reader to the following references ¼ ; (9)
ðHXÞT 0 ^ 0
[Christakos, 1984; Matheron, 1973]. Here R, X, and Q are
considered known and are part of a modeling choice.
[14] Some possible choices for the Covariance Kernel and then computing the resulting unknown field from the
Kð; Þ arise from the Matérn family of covariance kernels solution of the system of equations (9) by the following
[Stein, 1999], corresponding to isotropic, stationary stochas- transformation
tic process. They are defined as K; ðx; yÞ ¼ C; ðrÞ; r ¼
kx  yk with ^s ¼ X^ þ QHT ^ : (10)

def def
 We also denote by  ¼ HQHT þ R,  ¼ HX and we also
C; ðrÞ ¼ 1
ðrÞ K ðrÞ;  > 0;  > 0;  > 0; (4) define the matrix A as
2 CðÞ
!
where K is the modified Bessel function of second kind of  
order  and C is the gamma function. Equation (4) takes A¼ : (11)
T 0
special forms for certain parameters . For example, when
 ¼ 0:5, C; corresponds to the exponential covariance [17] A straightforward implementation to solve the max-
function,  ¼ 0:5 þ n where n is an integer, C; is the imum a posteriori (MAP) problem can be done by con-
product of an exponential covariance and a polynomial of structing the system of equations in Oðm2 n þ mn2 þ mnpÞ
order n. In the limit as  ! 1, and for appropriate scaling operations and solving the dense linear system using a
of , C; converges to the Gaussian covariance. direct solver such as Gaussian Elimination in Oðn þ pÞ3 .
[15] Since the prior probability distribution of the The storage costs are dominated by the dense Q matrix,
unknowns s and the probability density of the error are which is of size Oðm2 Þ. This cost can be significant in a
both Gaussian, the prior and noise pdfs can be written as problem of size m  106 or higher, which can easily occur
follows in practice. However, this difficulty is usually circumvented
  by computing QHT without constructing Q. This involves
1 n matrix-vector products involving Q. Constructing the ele-
pðsjÞ / exp  ðs  XÞT Q1 ðs  XÞ
2 ments of QHT in a direct fashion, requires significant CPU
 : (5)
1 time [Nowak et al., 2003]. Next, we will discuss a method
pðyjsÞ / exp  ðy  HsÞT R1 ðy  HsÞ for solving these systems, that rely on the fast multiplica-
2
tion of Q with a vector.
Using Bayes’ theorem, assuming a uniform prior for , i.e.,
pðÞ / 1, we have pðs; Þ, the prior probability distribution 3. Solving the System
of the unknowns and the probability distribution of the
3.1. Hierarchical Matrices
error can be combined to give the posterior probability dis-
tribution of the unknown parameters, s and  [18] Hierarchical matrices [Börm et al., 2003; Grasedyck
and Hackbusch, 2003; Rjasanow and Steinbach, 2007;
pðyjs; Þpðs; Þ Bebendorf, 2008] (or H-matrices, for short) are efficient
pðs; jyÞ ¼ data-sparse representations of certain densely populated
pðyÞ : (6)
matrices. The main idea that is used repeatedly in these kind
/ pðyjs; Þpðs; Þ of techniques, is to split a given matrix into a hierarchy of
rectangular blocks and approximate each of the blocks by a
Plugging in the expression for the respective terms, and low-rank matrix. Hierarchical matrices have been used suc-
using an alternative convenient notation, we find that cessfully in data-sparse representation of matrices arising in
  the Boundary Element method or for the approximation of
1 1 the inverse of a Finite Element discretization of an elliptic
pðs; jyÞ / exp  ks  XkQ1  ky  HskR1 ; (7)
2 2 partial differential operator. Fast algorithms have been
developed for this class of matrices, including matrix-vector
where kxkM ¼ xT Mx is a norm, when M is a positive defi- products, matrix addition, multiplication and factorization
nite matrix. The resulting posterior distribution is Gaussian, in almost linear complexity [Börm et al., 2003].

3 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

[19] The covariance matrix Q has entries Qij ¼ Kðxi ; xj Þ of the Krylov subspace methods, the interested reader is
for a set of points fxi gNi¼1 for a certain class of generalized referred to the following references [Benzi et al., 2005;
covariance functions Kð; Þ. Using the Hierarchical matrix Golub and Van Loan, 1996]. Generally speaking, a good
approach, it can be shown that the complexity of storing Q preconditioner is a reasonable approximation to the inverse
and multiplying Q and QT by an appropriately sized vector of a matrix, should be cheap to construct and to apply, and
has complexity Oðkmlog mÞ, where m is the number of its application either clusters the eigenvalues or reduces the
unknowns, and the numerical cost for approximating the condition number of the matrix, or both. We observed that in
whole matrix is Oðk 2 mlog mÞ, where k is the block-wise rank certain instances, the number of iterations required for con-
which is chosen such that the relative Frobenius norm of each vergence of our system were quite large and thus, we devise
subblock is . For a rigorous analysis, the reader is referred to a preconditioner to reduce the number of iterations. We will
[Bebendorf, 2000; Bebendorf and Rjasanow, 2003]. Thus, show that the choice of this particular preconditioner clusters
the approximation for the H-matrix is of the form several of the eigenvalues around 1, resulting in fewer
iterations.
kQ  QH kF  "kQkF : (12) [21] We make the following assumptions while deter-
mining the cost of our algorithm: the matrix R is diagonal
Our implementation closely follows the methods described and the cost of matrix-vector product involving H and HT
in the following references [Rjasanow and Steinbach, is given by
. For example, if H is dense then
¼ mn and
2007; Bebendorf, 2005]: the construction of the cluster if H is sparse then
¼ nnz, where nnz is the number of
tree and block-cluster tree are based on the bisection nonzeros that H has.
method described in the work of Rjasanow and Steinbach [22] The matrix A is never explicitly constructed, since Q
[2007], whereas the low-rank approximation to the sub- is never constructed explicitly. The cost for setting up A is
blocks is computed by using the Partially Pivoted Adaptive eliminated but the cost of setting up Q, using the H-matrix
Cross Approximation algorithm [Bebendorf and Rjasanow, approach and forming U, is Oð
p þ k 2 mlog mÞ, where k is
2003; Bebendorf, 2000, 2008]. the block rank used in the H-matrix approach. The action of
the matrix  on a vector x of size n, written as q ¼ x
3.2. Iterative Solver can be computed in three steps: compute the adjoint of the
[20] The system of equations (9), is solved iteratively measurement operator z HT x, computing the matrix-
using a Krylov subspace method. The key advantage of vector product w Qz approximately using the H-matrix
using Krylov subspace methods, is that they never require approach described in section 3.1 and finally compute,
the explicit entries of the matrix but only rely on matrix- q Hw þ Rx. The cost of computing one matrix-vector
vector products involving the matrix or its transpose. We product of matrix A involves the following operations: (1)
review the basic properties of Krylov subspace methods that multiplication with Q, (2) multiplications with H, HT and R
help us understand the convergence of the algorithms. These and finally, (3) multiplication with  and T on appropriate
methods when applied to the linear system of equations sized vectors. The cost per matrix-vector product of A is
Ax ¼ b, satisfy the following relation for the residual at the Oðkmlog m þ 2
þ np þ nÞ.
ith iteration
3.3. Preconditioner
ri 2 spanfr0 ; Ar0 ; A2 r0 ; . . . ; Ai1 r0 g ¼ ðAÞr0 ;  2 P i ; (13) [23] In several applications, the covariance matrix Q has
several eigenvalues that are practically zero. For such
cases, the covariance matrix Q can be approximated for the
where ri ¼ b  Axi is the residual at the ith iteration and purposes of computing a useful preconditioner, by a low-
P i is the set of polynomials of degree at most i with value 1 rank decomposition, by retaining only the r largest eigen-
at the origin. Minimal residual methods such as MINRES values and corresponding eigenvectors
[Paige and Saunders, 1975] or GMRES [Saad and Schultz,
1986] try to compute a polynomial  such that
Q Vr r VTr ; (16)
kri k ¼ min 2P i kðAÞr0 k: (14)
where the columns of Vr contain the r eigenvectors corre-
Further, if A is diagonalizable with A ¼ VV1 , we then sponding to the largest eigenvalues that form the diagonals
have the following tight bound [Benzi et al., 2005] of the matrix r . Of course, if we compute nearly all the
eigenvalues and eigenvectors of Q, then we can use the
kri k  kr0 kcondðVÞmin 2P i max j jð j Þj: (15) eigenvalue decomposition as a surrogate for the matrix Q
itself. However, this is extremely expensive in practice. To
A bound of the type in equation (15) provides intuition of compute a preconditioner, we only need an approximation
how the eigenvalue distribution influences the worst case to Q which need not necessarily be very accurate. We have
behavior. For example, if all the eigenvalues are clustered found that the number of eigenvalues and eigenvectors that
tightly around a single point, that is far away from the ori- need to be retained to compute such a low-rank representa-
gin, we can expect rapid convergence. Wide-spread eigen- tion is often on the order of 100 and is nearly independent
values, however, would lead to slow convergence. In order of the dimension of the problem. The relative error in con-
to keep the number of iterations reasonable, sometimes structing such an approximation is
efficient preconditioners are necessary, which transform the
linear system into another linear system that has more favor- def kQ  Vr r VTr k2 rþ1
r ¼ ¼ : (17)
able properties. For more details regarding the convergence kQk2 1

4 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

We use a matrix-free eigenvalue solver, for example the the choice of the Krylov subspace method. The matrix A,
Krylov-Schur method to compute the dominant eigenvalues as defined in equation (11), is symmetric but indefinite.
and eigenvectors of the covariance matrix Q, with matrix- This precludes the use of Conjugate Gradient algorithm.
vector products accelerated in the usual fashion using Hier- Thus, we use a restarted GMRES (50) solver for the linear
archical matrices, described in section 3.1. system of equations (9). We note that other possible
[24] For such matrices, we can use the above observation choices are either using MINRES [Paige and Saunders,
to construct an approximate inverse for the matrix , in the 1975] or Transpose Free QMR [Freund, 1993]. The low-
steps listed below. As before, we make the following rank representation of the matrix Q, which is required in
assumptions : the matrix R is diagonal and the cost of section 3.3 is computed using SLEPc [Hernandez et al.,
matrix-vector product involving H and HT is given by
. 2005], which is a related package to PETSc. We use the
[25] 1. Compute the low-rank representation of the ma- default Krylov-Schur option in SLEPc to compute the dom-
trix Q to obtain Q Vr r VTr . The number of iterations inant eigenvalues of Q.
required to compute the dominant portion of the spectrum 3.5. Other Approaches
using the Krylov-Schur method is independent of the size
of the system and the work required for computing this [34] For any iterative solver applied to an arbitrary linear
low-rank representation is Oðrkmlog mÞ, where k is the system of equations, it is difficult to guarantee convergence
block rank of the covariance matrix and r is the number of to a desired tolerance in a reasonable number of iterations.
eigenvalues and eigenvectors of Q that we compute. We briefly discuss an alternative approach to solving the sys-
[26] 2. Form the matrix tem of equations (9). Instead of using an iterative solver, we
can compute QHT using fast matrix-vector products using
M ¼ R1=2 HVr r1=2 : the H-matrix approach. Subsequently, we form the dense
matrix  and solve the system of equations (9) using a direct
The cost of this step is Oð
r þ rÞ. solver such as Gaussian elimination. However, it should be
[27] 3. Compute the singular value decomposition noted that this process can be expensive for a large number of
(SVD) of the matrix M as defined above measurements n because forming QHT requires Oðmnlog mÞ
operations. Moreover, solving the system of equations has a
M ¼ UVT ; complexity of Oðn þ pÞ3 which can be rather high both in
terms of storage and computation time for large n. By com-
which can be computed in Oðnr2 Þ assuming that n > r. parison, the iterative solver when it converges in a few itera-
[28] 4. Use Sherman-Morrison-Woodbury update [Hager, tions, requires far fewer matrix-vector products involving Q
1989] to compute the inverse of (in practice, much smaller than the number of measurements)
than the direct solver approach. Therefore, we recommend
ðMMT þ IÞ1 ¼ I  UDr UT ; using this approach only when the number of measurements
  is small or if the iterative solver fails to converge to a desired
2i tolerance.
where Dr ¼ diag , where i ; i ¼ 1; . . . ; min fn; rg
1 þ 2i
are the singular values of M. The only cost is forming Dr 4. Uncertainty Quantification and Conditional
which is OðrÞ. Realizations
[29] 5. Compute the approximate inverse of , which we [35] A commonly used strategy [Kitanidis, 1995; Zanini
denote by ^ 1 as
et al., 2009] to quantify the uncertainty associated with the
^ 1 ¼ R1  R1=2 UDr UT R1=2 :
 estimate of the solution, is via computing conditional real-
izations. This method avoids the computation of the poste-
[30] This can be computed in OðnrÞ. rior covariance matrix, which is expensive for large-scale
problems.
[31] The cost of applying  ^ 1 to a vector x is simply
OðnrÞ and is thus, independent on the number of 4.1. Conditional Realizations
unknowns m. [36] We first generate unconditional realizations vu 
[32] Finally, we show how to use  ^ 1 to construct a pre- N ð0; RÞ and su  N ð0; QÞ, which are realizations corre-
conditioner for our system. We use the block diagonal form sponding to the pdf of the noise and the prior probability of
[Benzi et al., 2005], which is commonly used in precondi- the unknown parameters s. The conditional realizations
tioning matrices arising from saddle point problems. Thus, [Kitanidis, 1995; Zanini et al., 2009] are generated by first
the preconditioner to the matrix defined in 11 is solving the linear inversion system with a modified right
" # hand side,
^ 1

P 1
¼ ^ 1 ¼ T 
S ^ 1 : (18) ! ! !
1
^
S HQHT þ R HX ^ y  Hsu þ vu
¼ ; (19)
3.4. Implementation Details ðHXÞT 0 ^ 0

[33] We have implemented the above algorithm in Cþþ,


and then, the conditional realization can be computed by
using PETSc [Balay et al., 1997, 2008, S. Balay et al.,
the following equation
PETSc, available at http://www.mcs.anl.gov/petsc, 2009]
which is a suite of data structures and routines for scalable
solutions in scientific applications. Finally, we comment on ^s c ¼ su þ X^ þ QHT ^ : (20)

5 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

Note that the system of equations (19) have the same left where, max and min are the largest and smallest eigenval-
hand side as the original system of equations (9), with only ues of the matrix Q. Thus, in order to generate uncondi-
modification in the right hand side. Thus, the implementa- tional realizations su  N ð0; QÞ, we need access to only the
tion of conditional simulations can be done without much upper and lower eigenvalues and fast matrix-vector prod-
recoding. ucts involving the covariance matrix Q, which we compute
4.2. Generating Unconditional Realizations using the H-matrix approach detailed in section 3.1. The
extreme eigenvalues can be computed using a matrix-free
[37] We now address the issue of generating uncondi- Krylov-Schur algorithm, which we access via SLEPc pack-
tional realizations vu  N ð0; RÞ and su  N ð0; QÞ. In most age. In the appendix A, we extend the analysis of Dietrich
applications, we assume the noise to be uncorrelated, that is, and Newsam [1995] to derive an error bound for the approx-
we assume that R is a diagonal matrix. Thus, we can easily imation of the square root of the matrix using Chebyshev
generate the unconditional realization vu ¼ R1=2 x, where polynomials to include approximation errors due to the
x  N ð0; IÞ. However, generating the unconditional realiza- Hierarchical matrix approach.
tion su is much more expensive because Q is dense. A proce- [39] In practice, the number of terms in the Chebyshev
dure that is often mentioned in literature, is to form the series necessary to obtain the desired tolerance of approxi-
Cholesky decomposition of Q ¼ FT F, where F is an upper mation, grows proportional to the square root of the largest
triangular matrix. We can then compute su ¼ Fx, where eigenvalue [Dietrich and Newsam, 1995]. Furthermore,
x  N ð0; IÞ. But since Q is dense, Cholesky Factorization, whenever the smallest eigenvalue of Q tends to zero, the
which scales as Oðm3 Þ, where m is number of unknown pa- convergence of the Chebyshev series will be slow, which is
rameters, becomes very expensive to compute. Moreover, characteristic of any polynomial function approximation to
Cholesky Factorization requires explicitly storage of Q, the square root, and is because pthe
which is not feasible. ffiffiffi origin is the singularity
point of the first derivative of x. A workaround for this
[38] We will describe a more efficient approach sample, problem is to use the so-called ‘‘nugget effect,’’ which
which relies on a polynomial approximation to the square involves adding a diagonal term to the covariance matrix
root and fast matrix-vector products. The only difference and has the form Q þ I, with > 0, so that the matrix
between the two approaches is that the manner in which the pffiffihas

a spectrum that lies in the interval ½ ; max þ  and x is
matrix-vector products are computed: our approach uses the approximated in the very same interval. This is the
H-matrix approach, whereas their approach uses FFT based ffi same as
pffiffiffiffiffiffiffiffiffiffi
computing the Chebyshev expansion for x þ . The use
methods. To sample from N ð0; QÞ, we need to compute of the nugget effect makes the number of terms in the Che-
byshev expansion small, which makes the algorithm more
su ¼ Q1=2 x x  N ð0; IÞ: efficient but adds some small-scale variability that is often
negligible, in practice.
This meets our requirements because
5. Numerical Experiments
E½su  ¼ 0 E½su sTu  ¼ Q1=2 E½xxT Q1=2 ¼ Q: [40] We now demonstrate the performance of our algo-
rithm on a few applications. We apply it to two applica-
We adopt a procedure similar to that of Dietrich and tions: interpolation involving noisy measurements on a
Newsam [1995], that exploits the ability to form fast matrix- standard test function and contaminant source identifica-
vector products involving the matrix Q. The method tries to tion. In the first application, the measurement operator is
approximate the square root of a matrix, using Chebyshev extremely sparse and in the second application, the mea-
matrix polynomials that can be generated using a three term surement operator is not constructed explicitly but we only
recurrence series. The explicit computation of Q1=2 is not have access to matrix-vector products of H and HT . The
necessary. The quantity of interest is su ¼ Q1=2 x, where x  above method was implemented in Cþþ using PETSc
N ð0; IÞ and whose polynomial approximation only requires libraries for the linear solvers. All test cases were run on a
matrix-vector products involving Q and can be computed as workstation with Intel Xeon E7540 2GHz processor with
128 GB RAM, running Ubuntu 11.04.
X
L
5.1. Application: Interpolation
su ai xi ;
i¼0 [41] To test our algorithm, we use a function introduced
by Franke [Franke, 1982] which is frequently used as a test
where case for testing interpolation algorithms.

x0 ¼ x x1 ¼ ðda Q þ db IÞx xnþ1 ¼ 2ðda Q þ db IÞxn  xn1: 3 2 2 3 2 2


Fðx1 ; x2 Þ ¼ eðð9x1 2Þ þð9x2 2Þ Þ=4 þ eð9x1 þ1Þ =49ð9x2 þ1Þ =10
4 4
The coefficients ai are the representation of the square root 1 ðð9x1 7Þ2 þð9x2 3Þ2 Þ=4 1 ð9x1 4Þ2 ð9x2 7Þ2 :
þ; e  e
function in Chebyshev polynomial basis in the domain 2 5
½ min ; max  [Mason and Handscomb, 2003]. The coeffi- (21)
cients da and db are given by
Figure 1 shows the plot of Franke’s function, which we
2 max þ min consider as a true field for this problem (see also Figure 2).
da ¼ db ¼  ;
max þ min max  min The setup for our problem is as follows. We pick m points

6 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

1 ¼ 0:5  L, for most of our calculations, where L is the


maximum domain size. For the construction of the H-matrix
QH , we pick the error to be " ¼ 109 .
[42] Table 1 summarizes the performance of our iterative
algorithm. For a given number of points m, we pick n to be
1%, 5% and 10% of m. With the increase in the number of
measurements, the error decreases. This is what we should
expect because we have more information about our sys-
tem. The number of iterations seems to increase linearly
with the number of measurements. While the number of iter-
ations without the preconditioner are not too large, with
increasing large number of measurements, the number of
iterations could be large in practice. To alleviate this prob-
lem, we use the preconditioner that we proposed in section
3.3, with the number of eigenvalues and eigenvectors r we
use for the preconditioner to be 50. Refer to Table 2 for the
results of the interpolation problem using the preconditioner.
[43] Now, we study the convergence of the iterative
Figure 1. Plot of Franke’s function defined in equation scheme for various covariance kernels. We choose a few
(21) in the domain ½0; 12 . different covariance kernels, the first four from the Matérn
covariance family and two others which are frequently
uniform randomly distributed in the domain ½0; 12 . Of used in radial basis function interpolation. The parameters
these m points, we pick at random n points and evaluate the were chosen such that they produced the neither the error,
function Fð; Þ as defined in equation (21) and to which, as compared to the true solution, nor the number of itera-
we add 0:01% noise. Thus, in this case both the measure- tions were too high. We pick n ¼ 1000 and m ¼ 100;000
ments and the evaluation points are irregularly distributed and tabulate the number of iterations for each covariance
which precludes the use of FFT based methods. The mea- kernel along with the L2 relative error. The results are
surement operator H for the interpolation problem is just shown in Table 3. We observe that the best convergence
composed of entries 0 and 1, and the nonzero entries occur among the covariance kernels in the Matérn family is the
when the measurement location coincides with the location one with the second-order polynomial prefactor, for a toler-
of the unknown. Thus, H is extremely sparse and only con- ance of L2 relative error less than 10% and so we use it
tains n nonzero entries. We solve the system of equations henceforth. Even though the radial basis function covari-
(9) and then evaluate the function at the m points. We make ance kernel seem to perform better, they are not positive
the following modeling choices. We pick X ¼ ½1; . . . ; 1T , definite but conditionally positive definite and we need spe-
R ¼ 104 I and the covariance matrix Q corresponding to cial care to treat them, especially for generating uncondi-
the following covariance kernel tional realizations.
[44] The geostatistical approach, can also be used to esti-
KðrÞ ¼ ð1 þ r þ ðrÞ2 =3Þexp ðrÞ; (22) mate the derivative of the function from noisy measure-
ments of the function. To see this, we consider the best
which is a member of the Matérn covariance family. The approximation ^sðÞ to the to the underlying function that we
reason for this choice will be explained shortly. We pick are trying to estimate sðÞ. After computing ^ and ^ as the

Figure 2. (left) krFk for Franke’s function and (right) reconstruction of the derivative kr^sk from
geostatistical approach for m ¼ 3600 and n ¼ 200. The L2 relative error is 0.1965.

7 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

Table 1. Performance of the Iterative Scheme for the Interpola- Table 3. Performance of the Iterative Scheme for Various Covari-
tion Problem Without Preconditionera ance Kernelsa
Unknowns Measurements Iterations Relative Error Kernel K(r) Iterations Relative Error

10,000 100 33 0.0920 exp(r) 84 0.00447


10,000 500 43 0.0345 (1 þ r) exp(r) 153 0.0011
10,000 1000 49 0.0242 (1 þ r þ (r)2/3) exp(r) 48 0.022
50,000 500 44 0.0344 exp(r2) 35 0.1122
50,000 2500 81 0.0149 log r 23 0.0756
50,000 5000 95 0.0087 r2 log r 179 0.0012
100,000 1000 49 0.0225 a
100,000 5000 95 0.0087 Here r ¼ kx  yk and  ¼ 2.
100,000 10,000 95 0.0088
500,000 5000 95 0.0225
500,000 10,000 133 0.0052
be 100. For this problem we choose m ¼ 90;000 and
500,000 50,000 239 0.0015 n ¼ 1000. We used the same measurements and measure-
ment locations for generating all three conditional realiza-
a
A relative tolerance of krk k=kr0 k < 106 was used as termination cri-
teria for the iterative solver.
tions. The results are presented in Figure 3.
5.2. Application: Contaminant Source Identification
solution of the system of equations (9), we can compute [46] We now introduce a large-scale linear inverse prob-
^sðÞ as lem that is governed by the transient advection-diffusion
transport model. The inverse problem in this case is then to
X
p X
m determine the initial concentration of the contaminant given
^sðÞ ¼ fk ðÞk þ Kðk  xi kÞðHT ^ Þi : (23) noisy measurements taken in time from sensors distributed
k¼1 i¼1
spatially, assuming a suitable prior covariance matrix and
the PDE model that we use to simulate the transport.
Therefore, we can compute the approximation to the rsðÞ,
the gradient of the underlying function by computing r^sðÞ 5.2.1. Problem Setup
using the following formula, [47] Consider the transient advection diffusion problem,
that describes the evolution of the contaminant
X
p Xm
x  xi
r^sðxÞ ¼ rfk ðxÞk þ K 0 ðkx  xi kÞ ðHT ^ Þi : @u
k¼1 i¼1
kx  xi k  Du þ v:ru ¼ 0;   ½0; T 
@t
(24)
u ¼ u0 ;   ft ¼ 0g
; (25)
It is easy to verify that this generating function is asymptoti- u¼0 @D  ½0; T 
cally smooth as well. Therefore, it can be approximated @u
using a block wise low-rank matrix using the procedure out- ¼0 @N  ½0; T 
@n
lined in section 3.1.
[45] We conclude this application by presenting results where u is the contaminant concentration, u0 is the initial
for the conditional realizations as outlined in section 4. We contamination field that we would like to determine, D is
first generate unconditional realizations ^s u and v using the the diffusion coefficient, v the known velocity field and T is
procedure outlined in section 4.2. We compute the extreme the final time.
eigenvalues using the Krylov-Schur algorithm accessed via [48] The forward problem is discretized using linear fi-
the SLEPc package [Hernandez et al., 2005]. We take the nite elements. The time integration is carried out by Crank-
number of Chebyshev polynomials in the approximation to Nicholson method. At each time step, the resulting system
of equations was solved using transpose free QMR
(TFQMR) with ILU preconditioner.
Table 2. Performance of the Iterative Scheme for the Interpola- [49] We work with dimensionless quantities throughout.
tion Problem Preconditionera We assume that the domain of the problem is ½0; L2 and
Unknowns Measurements Iterations Relative Error the velocity field v is chosen to be a Poiseuille flow over
this domain, that is vx ¼ v0 ðL=2 þ yÞðL=2  yÞ=ðL=2Þ2 ;
10,000 100 6 0.085 vy ¼ 0, so that the Peclet number based on the centerline
10,000 500 8 0.040
10,000 1000 9 0.023
velocity is Pe ¼ Lv0 =D. The molecular diffusion D is cho-
50,000 500 8 0.030 sen to be 1. Observations are collected at distinct locations
50,000 2500 13 0.0145 xj ; j ¼ 1; . . . ; nm and are uniformly distributed in the do-
50,000 5000 17 0.009 main and n t measurements are computed evenly in ½0; T ,
100,000 1000 9 0.0249 so that in all we have n ¼ nm  nt measurements. Further-
100,000 5000 18 0.0085
100,000 10,000 23 0.0055 more, in the rest of the section, we pick L ¼ T ¼ 1, the
500,000 5000 17 0.0093 centerline fluid velocity to be v0 ¼ 1 and the time step for
500,000 10,000 22 0.0057 the transient advection diffusion problem to be t ¼ 0:05.
500,000 50,000 31 0.0016 Figure 4 shows the reconstruction of the true field which
a
For the preconditioner, we choose r ¼ 50. A relative tolerance of we assume to be 2exp ð30 kx  xckÞ which is centered at
krk k=kr0 k < 106 was used as termination criteria for the iterative solver. xc ¼ ð0:25; 0:5Þ. The unknowns are discretized on a

8 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

[50] We pick an irregular grid to demonstrate the utility


of our algorithm. Suppose if we had reason to believe that
our source is located between ½0:125; 0:375  ½0:375;
0:625 and we want to resolve the location of our source
better. Instead of refining the grid everywhere in the do-
main, we could just refine the grid by a factor of 2, in the
desired region of interest. This is the principal advantage
that we have over FFT based methods that rely on regular
equispaced grids. an example of the grid that we used is
given in Figure 5.
[51] The advection diffusion problem described in equa-
tions (25), after discretization can be written as the linear
system of equations Ku ¼ Tu0 . Here K is the discretized
time-dependent convection diffusion operator and T is the
operator that maps the initial conditions u0 into space time.
The measurement operator, H can be constructed as H ¼
BK1 T, where B is the restriction operator that incorpo-
rates the sensor measurements. Of course, it may not be
feasible to explicitly construct this matrix. Instead, we can
compute matrix-vector product Hx in the following steps:
converting the initial conditions to space time z Tx,
solving the time dependent advection diffusion equation
w K1 z and finally computing the measurements
q Bw. The transpose of the measurement operator times
a vector can be computed in an analogous fashion. As
before, we pick X ¼ ½1; . . . ; 1T , R ¼ 104 I and the covar-
iance matrix Q corresponding to the following covariance
kernel in equation (22) with 1 ¼ 0:1  L, for all of our
calculations. For the construction of the H-matrix QH , we
pick the error to be  ¼ 109 .

5.2.2. Performance of the Iterative Method


[52] In this subsection, we study the convergence of the
iterative method for different grid sizes, number of meas-
urements and different covariance kernels. In all the cases
that we will discuss, perfect reconstruction will not be pos-
sible because of a nonzero regularization term R and a
fixed mesh size. In practice, the relative L2 error in our
reconstruction is usually less than 10%. Without using the
preconditioner, the number of iterations, even for moder-
ately large problem sizes, well exceed 300 iterations. This
number is unacceptably large in practice because the num-
ber of times the time-dependent advection problem is
solved is more than double the number of iterations, i.e.,
greater than 600. Thus, we will present results only for the
preconditioned case.
[53] First, we study the convergence of the iterative
scheme with increase in the number of sensors. With an
increase in the number of the sensors, the problem becomes
more well-posed because of new information provided by
the additional sensors. Thus, one might expect a quicker
inversion. However, this does not seem to be the case. The
iterative solver needs to perform more work to incorporate
additional information.
[54] In Table 4, we summarize the performance of the
iterative scheme. For the preconditioner, we use r ¼ 100
Figure 3. Conditional realizations for the interpolation
eigenvalues and the relative error in the low-rank approxi-
problem with m ¼ 90;000 and n ¼ 1000.
mation to Q, (see equation (17)) is r  105 . We immedi-
100  100 grid in space and 20 time steps, in the domain ately observe that the accuracy of the inversion increases
with L ¼ T ¼ 1. We pick v0 ¼ 1. The measurements are with the number of measurements. This is to be expected
collected in a 10  10  10 grid, i.e., nm ¼ 100; nt ¼ 10. since with increase in number of measurements, the amount
The relative error in the reconstruction was 0.063. of information entering the calculations increases. The

9 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

Figure 4. Reconstruction of a Gaussian initial condition 2exp ð30kx  xc kÞ which is centered at


xc ¼ ð0:25; 0:5Þ. The unknowns are discretized on a 100  100 grid in space and 20 time steps, in the do-
main with L ¼ T ¼ 1. We pick v0 ¼ 1. The measurements are collected in a 10  10  10 grid, i.e.,
nm ¼ 100; nt ¼ 10. (left) Reconstructed field and (right) true field. The relative error in the reconstruc-
tion was 0.063.

number of iterations required to converge to the desired tol- problem size that we solved for 9  104 inversion parame-
erance remains bounded for moderate number of measure- ters and 1.8 million space time unknowns.
ments 1000 but the iterations number of diverge when the [55] We now consider the problem with a large number
number of sensors were increased dramatically. Under such of measurements. We pick the number of sensors to be
circumstances, we increase the number of dominant eigen- 25  25 and since we take nt ¼ 10 measurements in time,
values and eigenvectors, r we use to approximate Q in the altogether we have 6; 250 measurements. We increase r,
preconditioner. We also observe that for a given number of the number of eigenvalues and eigenvectors of Q that we
measurements, the number of iterations it took to converge compute and study the convergence of the iterative method
does not depend on the number of unknowns, and in this on this problem. With increasing r, the approximation of
sense, the convergence is essentially grid independent. This the eigendecomposition to the matrix Q improves and this
is a very desirable property because the algorithm scales to results in a better preconditioner because it approximates
arbitrarily large number of unknowns. In all, the maximum the matrix A better. The downside is that our setup costs of
the preconditioner also increases significantly because not
only do we have to compute more eigenvalues and eigen-
vectors of Q but also, we need to multiply with H, r num-
ber of times(see step 2 in section 3.3), which is expensive
because each matrix-vector product with H involves solv-
ing the transient advection-diffusion equation. We set the
maximum number of iterations to be 300. In Table 5 we list
the number of iterations it took for a given r and the result-
ing L2 relative error of the computation. We observe that

Table 4. Performance of the Iterative Scheme for the Contami-


nant Source Identification Problema
Sensors Unknowns Iterations Relative Error

88 100  100 30 0.0941


88 200  20 32 0.0949
88 300  300 33 0.0953
10  10 100  100 38 0.0669
10  10 200  200 39 0.0675
10  10 300  300 41 0.0679
12  12 100  100 136 0.0495
Figure 5. An example of the grid that is used for the lin- 12  12 200  200 196 0.0503
ear inversion. The number of grid points are 10;000 and the 12  12 300  300 200 0.0500
grid is refined in the region x ¼ ½0:125; 0:375 and a
In each case the number of time measurements were nt ¼ 10, with L ¼
y ¼ ½0:375; 0:625. T ¼ 1 and Dt ¼ 0.05. For the preconditioner, we used r ¼ 100.

10 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

Table 5. Performance of Iterative Scheme With Increasing r for number of unknowns. This explains the grid-independent
Grid Size 100  100 and Number of Sensors 25  25, so That convergence that we observe, for the problem sizes that we
Number of Measurements are 6250 considered. We observed a similar trend for different num-
ber of sensors as well.
r Iterations r Relative Error
a 5 5.2.4. Comparison With FFT Based Methods on
129 300 2.45  10 0.0148
201 196 4.76  106 0.0146 Regular Grids
278 96 1.75  106 0.0145 [57] We conclude this application with the comparison
355 41 7.09  107 0.0144 of our algorithm with a comparison of our algorithm with
a
Reached the maximum number of iterations, 300, without converging an FFT based approach on regular grids. For the setup of
to the desired solver tolerance. the problem we pick 10  10 sensors with nt ¼ 10 and
t ¼ 0:05. The regular grid is constructed for a problem
size m ¼ 300  300. Everything else remains the same
with increasing r the number of iterations that it takes to con- from section 5.2. For comparison purposes we pick the fol-
verge to the desired tolerance d, in this case we use the same lowing three strategies.
condition that we do for interpolation krk k=kr0 k< 106 , [58] 1. Strategy 1 is that the matrix-vector products are
decreases. Therefore, we can view this scenario as a tradeoff computed using the FFT based method. We form QHT
between setup time and computation time. If we have to first, then form the matrix A and solve the system of equa-
solve the linear inverse problem for multiple right-hand sides, tions using a direct solver.
i.e., for multiple measurements, then it is worthwhile to com- [59] 2. Strategy 2 is that the matrix-vector products are
pute the preconditioner with larger r that will result in faster computed using the FFT based method. The rest of the
convergence. algorithm is the same as the one described in section 3.
[60] 3. Strategy 3 is that the authors use the algorithm
5.2.3. Spectrum of the Preconditioned Matrix described in section 3, with matrix vector products using
[56] To understand the working of the preconditioner, H-matrix approach.
we plot the spectrum of the preconditioned operator. Since [61] The first strategy is the direct implementation of FFT
it is extremely expensive to compute the full spectrum of based methods that is prevalent in the literature. We note
the operator, we restrict ourselves to the case when the that it can be expensive because it involves constructing H,
number of sensors are 10  10 and the number of spatially which we have already discussed can be extremely expen-
varying unknowns is at most 100  100. In setting up the sive. Thus, we propose a second alternative which is to use
preconditioner, we explicitly compute 100 dominant eigen- our original algorithm but modify it such that the matrix-
values and their corresponding eigenvectors. Even though vector products are computed using FFT based approaches
we request only a 100 eigenvalues from the SLEPc pack- as opposed to the H-matrix approach. These two strategies
age, in practice a few more eigenvalues converge (135) in were implemented using MATLAB. This might not be a fair
the Krylov-Schur iterations and so we use these in our comparison because we are comparing two different pro-
preconditioner as well. In Figure 6, we plot the absolute gramming languages and programming styles.
eigenvalues of the preconditioned operator. As can be seen [62] Table 6 compares the time taken to setup and solve
in Figure 6, most of the eigenvalues seem to cluster around the system of equations for the three different strategies. As
1 and the spectrum is more or less independent of the expected, the bulk of the computational time in Strategy 1

Figure 6. (left) Eigenvalues of the preconditioned operator. (right) Plot of #f i : j i  1j > xg against
x for all eigenvalues. For these plots, we assumed the number of observations to be 10  10  10, the
number of unknowns varied from 20  20 to 100  100. The covariance kernel is defined in equation (22).

11 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

Table 6. Comparison of the Three Strategies on a Regular Grid


300  300a
Appendix A: Approximation Error in
Chebyshev Matrix Polynomial Approach to
Strategy Setup Q Setup  Setup Preconditioner Solve Computing Square Root of an H-Matrix
1 0.6 10559.64 – 12.298 [65] In section 4.2, we described an algorithm for approx-
2 0.6 – 1360.62 811.61 imating the squareroot of a matrix Q, using Chebyshev ma-
3 77.61 – 2917.95 3265.5 trix polynomials. We further, assumed that instead of using
a
Time measured in seconds the matrix, we replace it via its approximation QH . We
now compute the result of approximating the square root of
a matrix using Chebyshev matrix polynomials and approxi-
is spent in setting up the system, whereas the time required mating the matrix itself using the H-matrix approach.
for solving the system is very small because of very effi- [66] We assume that the matrix Q is symmetric positive
cient direct methods and indeed it is the slowest of the three definite, with Q ¼ VVT , where V is the matrix of the
strategies. We expect Strategy 2 to outperform, because it eigenvectors, whereas D is the matrix whose diagonal
not only leverages the fact that there is no need to form H entries contain the eigenvectors of the matrix Q, arranged
but also FFT based matrix vector products are much faster in a decreasing order. Further, we know that Qk ¼ Vk VT
than H-matrix matrix-vector products. In fact, it is nearly and pnc ðQÞ ¼ Vpnc ðÞVT
twice as fast, which can be significant savings in time for [67] The result of n c terms of the Chebyshev matrix poly-
much larger problem sizes. FFT based methods have pro- nomial, we can approximate the function as follows
ven to be extremely fast, in no small part due to the avail-
ability of finely tuned libraries to perform FFT operations. pffiffiffi
x ¼ pnc ðxÞ þ ðxÞ;
Even though our algorithm is slower, with a sufficiently
good implementation, it would be competitive with FFT where pnc ðxÞ is a polynomial term corresponding to the
based methods even on regular grids. Of course, since Chebyshev series and ðxÞ is the remainder. Therefore,
FFT based methods are not easily applicable on irregular
grids, and thus, the default choice should be to use our
algorithm. Q1=2 ¼ Vðpnc ðÞ þ DÞVT ¼ pnc ðQÞ þ VDVT
0 1
ð 1 Þ
6. Conclusion B C
B ð 2 Þ C
[63] We have described an efficient numerical method to B C
B C
compute the best estimate for a linear under determined set D¼B C:
B .. C
of equations using the Stochastic Bayesian approach for B . C
@ A
geostatistical applications. We emphasize here, the general- ð m Þ
ity of our formulation, in the sense that it is capable of han-
dling a wide variety of generalized covariance functions
As a result of using QH instead of Q, i.e., by introducing
Kð; Þ with a minimum amount of recoding and the fact
the Hierarchical matrix approximation, we are computing
that this is applicable to situations in which the unknowns
pnc ðQH Þ instead of pnc ðQÞ. The resulting error in this
can be distributed on an irregularly spaced grid. Thus,
approximation is given by
unlike FFT based methods, we are not restricted to regular,
equispaced grids. However, FFT based methods are hard to
beat for regular grids because of the availability of finely Q1=2  pnc ðQH Þ ¼ pnc ðQÞ  pnc ðQH Þ þ VDVT :
tuned software. Therefore, if it is known a priori that the
grid is regular, we recommend using FFT to accelerate the Therefore,
matrix-vector products instead of the H-matrices and use
the rest of the algorithms described in section 3. kQ1=2  pnc ðQH ÞkF  kpnc ðQÞ  pnc ðQH ÞkF þ kVDVT kF
[64] The key idea was exploiting the Hierarchical matrix Xm 1=2
formulation, which allows to compute fast matrix-vector  LkQH  QkF þ ð j Þ2
products involving the covariance matrix Q approximately. j¼1
Our method scales well up to measurements of size 5  104 pffiffiffiffi
 "LkQkF þ m max 1jm ð j Þ
and unknowns up to size 5  105 on irregular grids. Under pffiffiffiffi pffiffiffiffi
some circumstance, the number of iterations required to  "L m max þ m ðnc ; max Þ
converge to the desired tolerance become unacceptably (A1)
large and we need to use the preconditioner that we pro-
posed in section 3.3. The setup cost can be quite large for In the third step, we have used the inequality from equation
large problems because it involves computing the dominant (12) which gives us the relative error in the Frobenius norm
spectrum of the covariance matrix Q. Further, exploiting a of the true covariance matrix Q by the H-matrix QH . Here
result from polynomial approximation theory, we approxi- L is the Lipschitz constant (see for example, Wihler [2009])
mated the square root of the covariance matrix Q using Che- of the function pnc ðÞ and ðnc ; max Þ is the maximum error
byshev polynomials to generate unconditional realizations, in the polynomial approximation due to nc terms and the
which were then used to generate conditional realizations maximum eigenvalue is max , which is computed in the
via a slight modification in the system of equations (9). The work of Dietrich and Newsam [1995]. Thus, in order that
extension to 3-D is also straightforward. the error due to approximation by Hierarchical matrices is

12 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522

consistent with the error due to truncation of the Chebyshev Franke, R. (1982), Scattered data interpolation: Tests of some method,
polynomial approximation to the square root of the matrix, Math. Comput., 38(157), 181–200.
Freund, R. W. (1993), A transpose-free quasi-minimal residual algorithm
the criterion for choosing  is for non-hermitian linear systems, SIAM J. Sci. Comput., 14, 470.
Fritz, J., I. Neuweiler, and W. Nowak (2009), Application of FFT-based
ðnc ; max Þ algorithms for large-scale universal kriging problems, Math. Geosci.,
 : (A2) 41(5), 509–533.
L max Golub, G. H., and C. F. Van Loan (1996), Matrix Computations, vol. 3,
Johns Hopkins Univ., Baltimore, Md.
In practice,  ¼ 109 easily satisfies this criteria even for Grasedyck, L., and W. Hackbusch (2003), Construction and arithmetics of
small values of ðnc ; max Þ. h-matrices, Computing, 70, 2003.
Greengard, L., and V. Rokhlin (1987), A fast algorithm for particle simula-
tions, J. Comput. Phys., 73(2), 325–348.
Hager, W. W. (1989), Updating the inverse of a matrix, SIAM Rev., 31(2),
[68] Acknowledgment. The authors were supported by NSF Award
0934596, Subsurface Imaging and Uncertainty Quantification. 221–239.
Hernandez, V., J. E. Roman, and V. Vidal (2005), Slepc: A scalable and
flexible toolkit for the solution of eigenvalue problems, ACM Trans.
Math. Software, 31(3), 351–362.
References Kitanidis, P. K. (1995), Quasilinear geostatistical theory for inversing,
Akcelik, V., G. Biros, O. Ghattas, K. R. Long, and B. B. Waanders (2003), Water Resour. Res., 31(10), 2411–2419.
A variational finite element method for source inversion for convective- Kitanidis, P. K. (2007), Bayesian and Geostatistical Approaches to Inverse
diffusive transport, Finite Elem. Anal. Des., 39(8), 683–705. Problems, in On Stochastic Inverse Modeling, Geophys. Monogr., vol.
Akcelik, V., G. Biros, A. Draganescu, J. Hill, O. Ghattas, and B. V. B. 171, ed. by L. Biegler, et al., pp. 19–30, AGU, Washington, D. C.
Waanders (2005), Dynamic data-driven inversion for terascale simula- Kitanidis, P. K. (2010), Bayesian and Geostatistical Approaches to Inverse
tions: Real-time identification of airborne contaminants, in Proceedings Problems, pp. 71–85. John Wiley, New York.
of the 2005 ACM/IEEE Conference on Supercomputing, p. 43, IEEE Li, W., and O. A. Cirpka (2006), Efficient geostatistical inverse methods
Comput. Soc. Press, Washington, D.C. for structured and unstructured grids, Water Resour. Res., 42, W06402,
Balay, S., W. D. Gropp, L. Curfman McInnes, and B. F. Smith (1997), doi:10.1029/2005WR004668.
Efficient management of parallelism in object oriented numerical soft- Mason, J. C., and D. C. Handscomb (2003), Chebyshev Polynomials, CRC
ware libraries, in Modern Software Tools in Scientific Computing, edited Press, Boca Raton, FL.
by E. Arge, A. M. Bruaset, and H. P. Langtangen, pp. 163–202, Matheron, G. (1973), The intrinsic random functions and their applications,
Birkhäuser Press, Boston, MA. Adv. Appl. Prob., 5(3), 439–468.
Balay, S., K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Michalak, A. M., and P. K. Kitanidis (2003), A method for enforcing pa-
Knepley, L. Curfman McInnes, B. F. Smith, and H. Zhang (2008), PETSc rameter nonnegativity in bayesian inverse problems with an application
Users Manual, Revision 3.0.0, Tech. Rep. ANL-95/11, Argonne Natl. Lab., to contaminant source identification, Water Resour. Res., 39(2), 1033,
Lemont, IL. doi:10.1029/2002WR001480.
Barnes, J., and P. Hut (1986), A hierarchical O(N log N) force-calculation Nowak, W., and O. A. Cirpka (2006), Geostatistical inference of hydraulic
algorithm, Nature, 324, 4. conductivity and dispersivities from hydraulic heads and tracer data,
Batu, V. (1998), Aquifer Hydraulics: A Comprehensive Guide to Hydro- Water Resour. Res., 42, W08416, doi:10.1029/2005WR004832.
geologic Data Analysis, John Wiley, New York. Nowak, W., S. Tenkleve, and O. A. Cirpka (2003), Efficient computation of
Bebendorf, M. (2000), Approximation of boundary element matrices, linearized cross-covariance and auto-covariance matrices of interdepend-
Numer. Math., 86(4), 565–589. ent quantities, Math. Geol., 35(1), 53–66.
Bebendorf, M. (2005), Hierarchical LU decomposition-based precondi- Paige, C. C., and M. A. Saunders (1975), Solution of sparse indefinite sys-
tioners for BEM, Computing, 74(3), 225–247. tems of linear equations, SIAM J. Numer. Anal., 12(4), 617–629.
Bebendorf, M. (2008), Hierarchical Matrices: A Means to Efficiently Solve Rjasanow, S., and O. Steinbach (2007), The Fast Solution of Boundary Inte-
Elliptic Boundary Value Problems, vol. 63, Lecture Notes Comput. Sci. gral Equations. Mathematical and Analytical Techniques with Applica-
Eng., Springer, New York. tions to Engineering, Springer, New York.
Bebendorf, M., and S. Rjasanow (2003), Adaptive low-rank approximation Rubin, Y., and S. S. Hubbard (2005), Hydrogeophysics, Springer, New York.
of collocation matrices, Computing, 70(1), 1–24. Saad, Y., and M. H. Schultz (1986), GMRES: A generalized minimal resid-
Benzi, M., G. H. Golub, and J. Liesen (2005), Numerical solution of saddle ual method for solving nonsymmetric linear systems, SIAM J. Sci. Stat.
point problems, Acta Numer., 14, 1–137. Comput., 7(3), 856–869.
Börm, S., L. Grasedyck, and W. Hackbusch (2003), Introduction to hier- Stein, M. L. (1999), Interpolation of Spatial Data: Some Theory for Kriging,
archical matrices with applications, Eng. Anal. Boundary Elem., 27(5), Springer, New York.
405–422. Wihler, T. P. (2009), On the hölder continuity of matrix functions for nor-
Cardiff, M., W. Barrash, P. K. Kitanidis, B. Malama, A. Revil, S. Straface, mal matrices, J. Inequalities Pure Appl. Math., 10, 1–5.
and E. Rizzo (2009), A potential-based inversion of unconfined steady- Ying, L., G. Biros, and D. Zorin (2004), A kernel-independent adaptive fast
state hydraulic tomography, Ground Water, 47(2), 259–270. multipole algorithm in two and three dimensions, J. Comput. Phys.,
Christakos, G. (1984), On the problem of permissible covariance and vario- 196(2), 591–626.
gram models, Water Resour. Res., 20(2), 251–265. Zanini, A., and P. K. Kitanidis (2009), Geostatistical inversing for large-
Dietrich, C. R., and G. N. Newsam (1995), Efficient generation of conditional contrast transmissivity fields, Stochastic Environ. Res. Risk Assess.,
simulations by chebyshev matrix polynomial approximations to the sym- 23(5), 565–577.
metric square root of the covariance matrix, Math. Geol., 27(2), 207–228.
Flath, H. P., L. C. Wilcox, V. Akc
elik, J. Hill, B. van Bloemen Waanders, and P. K. Kitanidis and A. K. Saibaba, Institute for Computational and Math-
O. Ghattas (2011), Fast algorithms for bayesian uncertainty quantification ematical Engineering, Jen-Hsun Huang Engineering Center, Stanford
in large-scale linear inverse problems based on low-rank partial hessian University, 475 Via Ortega, Stanford, CA 94305-4121, USA. (arvindks@
approximations, SIAM J. Sci. Comput., 33(1), 407–432. stanford.edu)

13 of 13

You might also like