Professional Documents
Culture Documents
Water Resources Research - 2012 - Saibaba - Efficient Methods For Large Scale Linear Inversion Using A Geostatistical
Water Resources Research - 2012 - Saibaba - Efficient Methods For Large Scale Linear Inversion Using A Geostatistical
1029/2011WR011778, 2012
W05522 1 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
products with the covariance matrix (even using fast sum- [11] Our major contributions are as follows. This paper
mation schemes described below) proves to be extremely describes methods for the efficient computational imple-
expensive. mentation of the geostatistical approach when measure-
[7] 3. Uncertainty quantification by generating condi- ments are collected in both space and time. We focus on
tional realizations, an important step is generating uncondi- the linear inversion case, and for cases for which the prior
tional realizations. General procedures such as Cholesky can be described as Gaussian. We will demonstrate that, for
factorization are extremely expensive and special treatment large-dimensional problems, these methods can reduce the
is required. computational burden by orders of magnitude compared to
[8] In this paper, we will deal sequentially with these direct or ‘‘naive’’ implementations that are common in
issues. The covariance matrices although dense, have spe- applications. Our algorithm heavily uses the Hierarchical
cial structure which can be exploited. They are similar to matrix approach and, thus, is not limited to regular equi-
dense matrices that arise from the discretization of integral spaced grids unlike FFT based algorithms. We use a matrix-
equations. Various fast summation schemes have been free iterative solver to compute the maximum-a-posteriori
devised to provide matrix-vector products in OðN log N Þ estimate and in order to keep the number of iterations small,
for such problems, where N is the number of unknowns and we have designed an efficient preconditioner that relies on a
0 is some integer, depending on the chosen approach. low-rank representation of the covariance matrix, whose
They broadly fall under the following categories: (1) Tree eigenvalues are known to decay rapidly. An important aspect
Codes like Barnes-Hut algorithm [Barnes and Hut, 1986], of the geostatistical approach is to be able to quantify the
(2) Fast Multipole Methods [Greengard and Rokhlin, uncertainty of the unknown field and we perform this by
1987; Ying et al., 2004] and (3) Hierarchical matrices and generating conditional realization from the posterior distri-
Adaptive Cross Approximation [Bebendorf, 2000; Börm bution of the unknowns. We apply our method to interpola-
et al., 2003; Grasedyck and Hackbusch, 2003; Rjasanow tion from noisy observations and the problem of determining
and Steinbach, 2007]. the initial contaminant distribution from the time-history of
[9] Previously, for regular equispaced grids, the Toeplitz measurements from spatially distributed sensors. This prob-
or block-Toeplitz structure of the covariance matrix has lem has been discussed extensively in the following referen-
been exploited to accelerate matrix-vector products to ces [Akcelik et al., 2003, 2005; Flath et al., 2011; Michalak
OðN log NÞ, where N is the number of unknowns. In partic- and Kitanidis, 2003].
ular, Nowak and Cirpka [2006] utilized this method to esti- [12] The rest of this article is organized as follows.
mate hydraulic conductivity and dispersivities in a large- Section 2 describes the Geostatistical approach to linear
scale problem. However, they cannot easily be extended to inverse problem. In section 3.1 we describe in detail the
handle nonuniform grids. In the work of Fritz et al. [2009], Hierarchical matrix approach to compute fast matrix-vector
they extended the FFT based algorithm to deal with irregu- products involving the dense Covariance matrix in a black-
larly spaced measurements but they did not show how box fashion and section 3 describes utilizes these fast
to extend their algorithm for general measurement opera- matrix-vector products along with a Krylov subspace solver
tors or the case when the underlying unknowns are not on a to solve the linear inverse problem. We also discuss the
regular grid. An algorithm to deal with irregular grids was implementation details regarding the preconditioner and a
developed in the work of Li and Cirpka [2006], that summary of the operation cost of our algorithm. Further-
exploited the Karhunen-Loève expansion to represent the more, in section 4.1, an algorithm for computing approxi-
spatial random field corresponding to a stationary covari- mate conditional realizations based on Chebyshev matrix
ance function. They concluded that when the covariance polynomials utilizing the fast matrix-vector products
function is smooth, with a large correlation length, the described in section 3.1. Finally, we demonstrate the per-
number of terms required in the expansion will be small, so formance of our algorithm in section 5.
this method would be efficient compared to the FFT based
approach on regular grids. However, they still suffer from 2. Problem Formulation
the same defect as the FFT based methods, namely, when [13] For the sake of completeness, we will review here
the measurement operator is dense and the number of the problem formulation and the prevalent solution
measurements are high, forming matrix-matrix products approach. Consider a function sðxÞ, such as log conductiv-
with the covariance matrices are expensive. ity, to be estimated. Focusing on a prevalent approach, the
[10] Recently [e.g., Rjasanow and Steinbach, 2007; function is represented as follows
Bebendorf, 2008], Fast Multipole Methods and Hierarchi-
cal matrices have gained a lot of popularity in computing X
p
fast matrix-vector products for certain class of dense matri- sðxÞ ¼ fk ðxÞk þ ðxÞ; (1)
k¼1
ces. Hierarchical matrices appear ideally suited to our cur-
rent situation because they are applicable to a wide variety where typically, fk 2 dp , the space of pth degree polynomials
of kernels, without much reimplementation, since they han- in d variables, are unknown coefficients and the second
dle most operations algebraically. We choose to use the term is a random function with zero mean and character-
Hierarchical matrix approach over the kernel independent ized by a covariance function. After discretization (e.g.,
versions of the Fast Multipole Method because they are using finite differences, or finite element models), sðxÞ is
relatively easier to implement. They can also be combined represented by the vector s 2 Rm , from a set of noisy meas-
effectively with iterative solvers like GMRES [Saad and urements y 2 Rn
Schultz, 1986], to solve linear systems involving these
dense matrices. y ¼ Hs þ v v N ð0; RÞ; (2)
2 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
where H 2 Rnm is called the measurement operator and v as well. The posterior mean values, ^s and ^ is given by the
is an random n-vector corresponding to measurement error, maximum a posteriori estimate, which is equivalent to solv-
with mean zero and covariance matrix R. We also have, ing a weighted least squares optimization problems
E½s ¼ X E½ðs XÞðs XÞT ¼ Q; (3) 1 1
arg mins; ks XkQ1 þ ky HskR1 : (8)
2 2
where X is m p drift matrix, whose columns are com-
puted as Xij ¼ fi ðxj Þ and are p unknown drift coefficients. [16] For n < m, it is more convenient to compute the so-
The entries of the covariance matrix Q, are given by lution to this optimization problem (8) by first obtaining
Qij ¼ Kðxi ; xj Þ, where Kð; Þ is a generalized covariance the solution of the following linear system of equations,
function which must be conditionally positive definite. For ! ! !
a more detailed discussion on covariance kernels that per- HQHT þ R HX ^ y
missible, we refer the reader to the following references ¼ ; (9)
ðHXÞT 0 ^ 0
[Christakos, 1984; Matheron, 1973]. Here R, X, and Q are
considered known and are part of a modeling choice.
[14] Some possible choices for the Covariance Kernel and then computing the resulting unknown field from the
Kð; Þ arise from the Matérn family of covariance kernels solution of the system of equations (9) by the following
[Stein, 1999], corresponding to isotropic, stationary stochas- transformation
tic process. They are defined as K; ðx; yÞ ¼ C; ðrÞ; r ¼
kx yk with ^s ¼ X^ þ QHT ^ : (10)
def def
We also denote by ¼ HQHT þ R, ¼ HX and we also
C; ðrÞ ¼ 1
ðrÞ K ðrÞ; > 0; > 0; > 0; (4) define the matrix A as
2 CðÞ
!
where K is the modified Bessel function of second kind of
order and C is the gamma function. Equation (4) takes A¼ : (11)
T 0
special forms for certain parameters . For example, when
¼ 0:5, C; corresponds to the exponential covariance [17] A straightforward implementation to solve the max-
function, ¼ 0:5 þ n where n is an integer, C; is the imum a posteriori (MAP) problem can be done by con-
product of an exponential covariance and a polynomial of structing the system of equations in Oðm2 n þ mn2 þ mnpÞ
order n. In the limit as ! 1, and for appropriate scaling operations and solving the dense linear system using a
of , C; converges to the Gaussian covariance. direct solver such as Gaussian Elimination in Oðn þ pÞ3 .
[15] Since the prior probability distribution of the The storage costs are dominated by the dense Q matrix,
unknowns s and the probability density of the error are which is of size Oðm2 Þ. This cost can be significant in a
both Gaussian, the prior and noise pdfs can be written as problem of size m 106 or higher, which can easily occur
follows in practice. However, this difficulty is usually circumvented
by computing QHT without constructing Q. This involves
1 n matrix-vector products involving Q. Constructing the ele-
pðsjÞ / exp ðs XÞT Q1 ðs XÞ
2 ments of QHT in a direct fashion, requires significant CPU
: (5)
1 time [Nowak et al., 2003]. Next, we will discuss a method
pðyjsÞ / exp ðy HsÞT R1 ðy HsÞ for solving these systems, that rely on the fast multiplica-
2
tion of Q with a vector.
Using Bayes’ theorem, assuming a uniform prior for , i.e.,
pðÞ / 1, we have pðs; Þ, the prior probability distribution 3. Solving the System
of the unknowns and the probability distribution of the
3.1. Hierarchical Matrices
error can be combined to give the posterior probability dis-
tribution of the unknown parameters, s and [18] Hierarchical matrices [Börm et al., 2003; Grasedyck
and Hackbusch, 2003; Rjasanow and Steinbach, 2007;
pðyjs; Þpðs; Þ Bebendorf, 2008] (or H-matrices, for short) are efficient
pðs; jyÞ ¼ data-sparse representations of certain densely populated
pðyÞ : (6)
matrices. The main idea that is used repeatedly in these kind
/ pðyjs; Þpðs; Þ of techniques, is to split a given matrix into a hierarchy of
rectangular blocks and approximate each of the blocks by a
Plugging in the expression for the respective terms, and low-rank matrix. Hierarchical matrices have been used suc-
using an alternative convenient notation, we find that cessfully in data-sparse representation of matrices arising in
the Boundary Element method or for the approximation of
1 1 the inverse of a Finite Element discretization of an elliptic
pðs; jyÞ / exp ks XkQ1 ky HskR1 ; (7)
2 2 partial differential operator. Fast algorithms have been
developed for this class of matrices, including matrix-vector
where kxkM ¼ xT Mx is a norm, when M is a positive defi- products, matrix addition, multiplication and factorization
nite matrix. The resulting posterior distribution is Gaussian, in almost linear complexity [Börm et al., 2003].
3 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
[19] The covariance matrix Q has entries Qij ¼ Kðxi ; xj Þ of the Krylov subspace methods, the interested reader is
for a set of points fxi gNi¼1 for a certain class of generalized referred to the following references [Benzi et al., 2005;
covariance functions Kð; Þ. Using the Hierarchical matrix Golub and Van Loan, 1996]. Generally speaking, a good
approach, it can be shown that the complexity of storing Q preconditioner is a reasonable approximation to the inverse
and multiplying Q and QT by an appropriately sized vector of a matrix, should be cheap to construct and to apply, and
has complexity Oðkmlog mÞ, where m is the number of its application either clusters the eigenvalues or reduces the
unknowns, and the numerical cost for approximating the condition number of the matrix, or both. We observed that in
whole matrix is Oðk 2 mlog mÞ, where k is the block-wise rank certain instances, the number of iterations required for con-
which is chosen such that the relative Frobenius norm of each vergence of our system were quite large and thus, we devise
subblock is . For a rigorous analysis, the reader is referred to a preconditioner to reduce the number of iterations. We will
[Bebendorf, 2000; Bebendorf and Rjasanow, 2003]. Thus, show that the choice of this particular preconditioner clusters
the approximation for the H-matrix is of the form several of the eigenvalues around 1, resulting in fewer
iterations.
kQ QH kF "kQkF : (12) [21] We make the following assumptions while deter-
mining the cost of our algorithm: the matrix R is diagonal
Our implementation closely follows the methods described and the cost of matrix-vector product involving H and HT
in the following references [Rjasanow and Steinbach, is given by
. For example, if H is dense then
¼ mn and
2007; Bebendorf, 2005]: the construction of the cluster if H is sparse then
¼ nnz, where nnz is the number of
tree and block-cluster tree are based on the bisection nonzeros that H has.
method described in the work of Rjasanow and Steinbach [22] The matrix A is never explicitly constructed, since Q
[2007], whereas the low-rank approximation to the sub- is never constructed explicitly. The cost for setting up A is
blocks is computed by using the Partially Pivoted Adaptive eliminated but the cost of setting up Q, using the H-matrix
Cross Approximation algorithm [Bebendorf and Rjasanow, approach and forming U, is Oð
p þ k 2 mlog mÞ, where k is
2003; Bebendorf, 2000, 2008]. the block rank used in the H-matrix approach. The action of
the matrix on a vector x of size n, written as q ¼ x
3.2. Iterative Solver can be computed in three steps: compute the adjoint of the
[20] The system of equations (9), is solved iteratively measurement operator z HT x, computing the matrix-
using a Krylov subspace method. The key advantage of vector product w Qz approximately using the H-matrix
using Krylov subspace methods, is that they never require approach described in section 3.1 and finally compute,
the explicit entries of the matrix but only rely on matrix- q Hw þ Rx. The cost of computing one matrix-vector
vector products involving the matrix or its transpose. We product of matrix A involves the following operations: (1)
review the basic properties of Krylov subspace methods that multiplication with Q, (2) multiplications with H, HT and R
help us understand the convergence of the algorithms. These and finally, (3) multiplication with and T on appropriate
methods when applied to the linear system of equations sized vectors. The cost per matrix-vector product of A is
Ax ¼ b, satisfy the following relation for the residual at the Oðkmlog m þ 2
þ np þ nÞ.
ith iteration
3.3. Preconditioner
ri 2 spanfr0 ; Ar0 ; A2 r0 ; . . . ; Ai1 r0 g ¼ ðAÞr0 ; 2 P i ; (13) [23] In several applications, the covariance matrix Q has
several eigenvalues that are practically zero. For such
cases, the covariance matrix Q can be approximated for the
where ri ¼ b Axi is the residual at the ith iteration and purposes of computing a useful preconditioner, by a low-
P i is the set of polynomials of degree at most i with value 1 rank decomposition, by retaining only the r largest eigen-
at the origin. Minimal residual methods such as MINRES values and corresponding eigenvectors
[Paige and Saunders, 1975] or GMRES [Saad and Schultz,
1986] try to compute a polynomial such that
Q Vr r VTr ; (16)
kri k ¼ min 2P i kðAÞr0 k: (14)
where the columns of Vr contain the r eigenvectors corre-
Further, if A is diagonalizable with A ¼ VV1 , we then sponding to the largest eigenvalues that form the diagonals
have the following tight bound [Benzi et al., 2005] of the matrix r . Of course, if we compute nearly all the
eigenvalues and eigenvectors of Q, then we can use the
kri k kr0 kcondðVÞmin 2P i max j jð j Þj: (15) eigenvalue decomposition as a surrogate for the matrix Q
itself. However, this is extremely expensive in practice. To
A bound of the type in equation (15) provides intuition of compute a preconditioner, we only need an approximation
how the eigenvalue distribution influences the worst case to Q which need not necessarily be very accurate. We have
behavior. For example, if all the eigenvalues are clustered found that the number of eigenvalues and eigenvectors that
tightly around a single point, that is far away from the ori- need to be retained to compute such a low-rank representa-
gin, we can expect rapid convergence. Wide-spread eigen- tion is often on the order of 100 and is nearly independent
values, however, would lead to slow convergence. In order of the dimension of the problem. The relative error in con-
to keep the number of iterations reasonable, sometimes structing such an approximation is
efficient preconditioners are necessary, which transform the
linear system into another linear system that has more favor- def kQ Vr r VTr k2 rþ1
r ¼ ¼ : (17)
able properties. For more details regarding the convergence kQk2 1
4 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
We use a matrix-free eigenvalue solver, for example the the choice of the Krylov subspace method. The matrix A,
Krylov-Schur method to compute the dominant eigenvalues as defined in equation (11), is symmetric but indefinite.
and eigenvectors of the covariance matrix Q, with matrix- This precludes the use of Conjugate Gradient algorithm.
vector products accelerated in the usual fashion using Hier- Thus, we use a restarted GMRES (50) solver for the linear
archical matrices, described in section 3.1. system of equations (9). We note that other possible
[24] For such matrices, we can use the above observation choices are either using MINRES [Paige and Saunders,
to construct an approximate inverse for the matrix , in the 1975] or Transpose Free QMR [Freund, 1993]. The low-
steps listed below. As before, we make the following rank representation of the matrix Q, which is required in
assumptions : the matrix R is diagonal and the cost of section 3.3 is computed using SLEPc [Hernandez et al.,
matrix-vector product involving H and HT is given by
. 2005], which is a related package to PETSc. We use the
[25] 1. Compute the low-rank representation of the ma- default Krylov-Schur option in SLEPc to compute the dom-
trix Q to obtain Q Vr r VTr . The number of iterations inant eigenvalues of Q.
required to compute the dominant portion of the spectrum 3.5. Other Approaches
using the Krylov-Schur method is independent of the size
of the system and the work required for computing this [34] For any iterative solver applied to an arbitrary linear
low-rank representation is Oðrkmlog mÞ, where k is the system of equations, it is difficult to guarantee convergence
block rank of the covariance matrix and r is the number of to a desired tolerance in a reasonable number of iterations.
eigenvalues and eigenvectors of Q that we compute. We briefly discuss an alternative approach to solving the sys-
[26] 2. Form the matrix tem of equations (9). Instead of using an iterative solver, we
can compute QHT using fast matrix-vector products using
M ¼ R1=2 HVr r1=2 : the H-matrix approach. Subsequently, we form the dense
matrix and solve the system of equations (9) using a direct
The cost of this step is Oð
r þ rÞ. solver such as Gaussian elimination. However, it should be
[27] 3. Compute the singular value decomposition noted that this process can be expensive for a large number of
(SVD) of the matrix M as defined above measurements n because forming QHT requires Oðmnlog mÞ
operations. Moreover, solving the system of equations has a
M ¼ UVT ; complexity of Oðn þ pÞ3 which can be rather high both in
terms of storage and computation time for large n. By com-
which can be computed in Oðnr2 Þ assuming that n > r. parison, the iterative solver when it converges in a few itera-
[28] 4. Use Sherman-Morrison-Woodbury update [Hager, tions, requires far fewer matrix-vector products involving Q
1989] to compute the inverse of (in practice, much smaller than the number of measurements)
than the direct solver approach. Therefore, we recommend
ðMMT þ IÞ1 ¼ I UDr UT ; using this approach only when the number of measurements
is small or if the iterative solver fails to converge to a desired
2i tolerance.
where Dr ¼ diag , where i ; i ¼ 1; . . . ; min fn; rg
1 þ 2i
are the singular values of M. The only cost is forming Dr 4. Uncertainty Quantification and Conditional
which is OðrÞ. Realizations
[29] 5. Compute the approximate inverse of , which we [35] A commonly used strategy [Kitanidis, 1995; Zanini
denote by ^ 1 as
et al., 2009] to quantify the uncertainty associated with the
^ 1 ¼ R1 R1=2 UDr UT R1=2 :
estimate of the solution, is via computing conditional real-
izations. This method avoids the computation of the poste-
[30] This can be computed in OðnrÞ. rior covariance matrix, which is expensive for large-scale
problems.
[31] The cost of applying ^ 1 to a vector x is simply
OðnrÞ and is thus, independent on the number of 4.1. Conditional Realizations
unknowns m. [36] We first generate unconditional realizations vu
[32] Finally, we show how to use ^ 1 to construct a pre- N ð0; RÞ and su N ð0; QÞ, which are realizations corre-
conditioner for our system. We use the block diagonal form sponding to the pdf of the noise and the prior probability of
[Benzi et al., 2005], which is commonly used in precondi- the unknown parameters s. The conditional realizations
tioning matrices arising from saddle point problems. Thus, [Kitanidis, 1995; Zanini et al., 2009] are generated by first
the preconditioner to the matrix defined in 11 is solving the linear inversion system with a modified right
" # hand side,
^ 1
P 1
¼ ^ 1 ¼ T
S ^ 1 : (18) ! ! !
1
^
S HQHT þ R HX ^ y Hsu þ vu
¼ ; (19)
3.4. Implementation Details ðHXÞT 0 ^ 0
5 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
Note that the system of equations (19) have the same left where, max and min are the largest and smallest eigenval-
hand side as the original system of equations (9), with only ues of the matrix Q. Thus, in order to generate uncondi-
modification in the right hand side. Thus, the implementa- tional realizations su N ð0; QÞ, we need access to only the
tion of conditional simulations can be done without much upper and lower eigenvalues and fast matrix-vector prod-
recoding. ucts involving the covariance matrix Q, which we compute
4.2. Generating Unconditional Realizations using the H-matrix approach detailed in section 3.1. The
extreme eigenvalues can be computed using a matrix-free
[37] We now address the issue of generating uncondi- Krylov-Schur algorithm, which we access via SLEPc pack-
tional realizations vu N ð0; RÞ and su N ð0; QÞ. In most age. In the appendix A, we extend the analysis of Dietrich
applications, we assume the noise to be uncorrelated, that is, and Newsam [1995] to derive an error bound for the approx-
we assume that R is a diagonal matrix. Thus, we can easily imation of the square root of the matrix using Chebyshev
generate the unconditional realization vu ¼ R1=2 x, where polynomials to include approximation errors due to the
x N ð0; IÞ. However, generating the unconditional realiza- Hierarchical matrix approach.
tion su is much more expensive because Q is dense. A proce- [39] In practice, the number of terms in the Chebyshev
dure that is often mentioned in literature, is to form the series necessary to obtain the desired tolerance of approxi-
Cholesky decomposition of Q ¼ FT F, where F is an upper mation, grows proportional to the square root of the largest
triangular matrix. We can then compute su ¼ Fx, where eigenvalue [Dietrich and Newsam, 1995]. Furthermore,
x N ð0; IÞ. But since Q is dense, Cholesky Factorization, whenever the smallest eigenvalue of Q tends to zero, the
which scales as Oðm3 Þ, where m is number of unknown pa- convergence of the Chebyshev series will be slow, which is
rameters, becomes very expensive to compute. Moreover, characteristic of any polynomial function approximation to
Cholesky Factorization requires explicitly storage of Q, the square root, and is because pthe
which is not feasible. ffiffiffi origin is the singularity
point of the first derivative of x. A workaround for this
[38] We will describe a more efficient approach sample, problem is to use the so-called ‘‘nugget effect,’’ which
which relies on a polynomial approximation to the square involves adding a diagonal term to the covariance matrix
root and fast matrix-vector products. The only difference and has the form Q þ
I, with
> 0, so that the matrix
between the two approaches is that the manner in which the pffiffihas
ffi
a spectrum that lies in the interval ½
; max þ
and x is
matrix-vector products are computed: our approach uses the approximated in the very same interval. This is the
H-matrix approach, whereas their approach uses FFT based ffi same as
pffiffiffiffiffiffiffiffiffiffi
computing the Chebyshev expansion for x þ
. The use
methods. To sample from N ð0; QÞ, we need to compute of the nugget effect makes the number of terms in the Che-
byshev expansion small, which makes the algorithm more
su ¼ Q1=2 x x N ð0; IÞ: efficient but adds some small-scale variability that is often
negligible, in practice.
This meets our requirements because
5. Numerical Experiments
E½su ¼ 0 E½su sTu ¼ Q1=2 E½xxT Q1=2 ¼ Q: [40] We now demonstrate the performance of our algo-
rithm on a few applications. We apply it to two applica-
We adopt a procedure similar to that of Dietrich and tions: interpolation involving noisy measurements on a
Newsam [1995], that exploits the ability to form fast matrix- standard test function and contaminant source identifica-
vector products involving the matrix Q. The method tries to tion. In the first application, the measurement operator is
approximate the square root of a matrix, using Chebyshev extremely sparse and in the second application, the mea-
matrix polynomials that can be generated using a three term surement operator is not constructed explicitly but we only
recurrence series. The explicit computation of Q1=2 is not have access to matrix-vector products of H and HT . The
necessary. The quantity of interest is su ¼ Q1=2 x, where x above method was implemented in Cþþ using PETSc
N ð0; IÞ and whose polynomial approximation only requires libraries for the linear solvers. All test cases were run on a
matrix-vector products involving Q and can be computed as workstation with Intel Xeon E7540 2GHz processor with
128 GB RAM, running Ubuntu 11.04.
X
L
5.1. Application: Interpolation
su ai xi ;
i¼0 [41] To test our algorithm, we use a function introduced
by Franke [Franke, 1982] which is frequently used as a test
where case for testing interpolation algorithms.
6 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
Figure 2. (left) krFk for Franke’s function and (right) reconstruction of the derivative kr^sk from
geostatistical approach for m ¼ 3600 and n ¼ 200. The L2 relative error is 0.1965.
7 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
Table 1. Performance of the Iterative Scheme for the Interpola- Table 3. Performance of the Iterative Scheme for Various Covari-
tion Problem Without Preconditionera ance Kernelsa
Unknowns Measurements Iterations Relative Error Kernel K(r) Iterations Relative Error
8 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
9 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
number of iterations required to converge to the desired tol- problem size that we solved for 9 104 inversion parame-
erance remains bounded for moderate number of measure- ters and 1.8 million space time unknowns.
ments 1000 but the iterations number of diverge when the [55] We now consider the problem with a large number
number of sensors were increased dramatically. Under such of measurements. We pick the number of sensors to be
circumstances, we increase the number of dominant eigen- 25 25 and since we take nt ¼ 10 measurements in time,
values and eigenvectors, r we use to approximate Q in the altogether we have 6; 250 measurements. We increase r,
preconditioner. We also observe that for a given number of the number of eigenvalues and eigenvectors of Q that we
measurements, the number of iterations it took to converge compute and study the convergence of the iterative method
does not depend on the number of unknowns, and in this on this problem. With increasing r, the approximation of
sense, the convergence is essentially grid independent. This the eigendecomposition to the matrix Q improves and this
is a very desirable property because the algorithm scales to results in a better preconditioner because it approximates
arbitrarily large number of unknowns. In all, the maximum the matrix A better. The downside is that our setup costs of
the preconditioner also increases significantly because not
only do we have to compute more eigenvalues and eigen-
vectors of Q but also, we need to multiply with H, r num-
ber of times(see step 2 in section 3.3), which is expensive
because each matrix-vector product with H involves solv-
ing the transient advection-diffusion equation. We set the
maximum number of iterations to be 300. In Table 5 we list
the number of iterations it took for a given r and the result-
ing L2 relative error of the computation. We observe that
10 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
Table 5. Performance of Iterative Scheme With Increasing r for number of unknowns. This explains the grid-independent
Grid Size 100 100 and Number of Sensors 25 25, so That convergence that we observe, for the problem sizes that we
Number of Measurements are 6250 considered. We observed a similar trend for different num-
ber of sensors as well.
r Iterations r Relative Error
a 5 5.2.4. Comparison With FFT Based Methods on
129 300 2.45 10 0.0148
201 196 4.76 106 0.0146 Regular Grids
278 96 1.75 106 0.0145 [57] We conclude this application with the comparison
355 41 7.09 107 0.0144 of our algorithm with a comparison of our algorithm with
a
Reached the maximum number of iterations, 300, without converging an FFT based approach on regular grids. For the setup of
to the desired solver tolerance. the problem we pick 10 10 sensors with nt ¼ 10 and
t ¼ 0:05. The regular grid is constructed for a problem
size m ¼ 300 300. Everything else remains the same
with increasing r the number of iterations that it takes to con- from section 5.2. For comparison purposes we pick the fol-
verge to the desired tolerance d, in this case we use the same lowing three strategies.
condition that we do for interpolation krk k=kr0 k< 106 , [58] 1. Strategy 1 is that the matrix-vector products are
decreases. Therefore, we can view this scenario as a tradeoff computed using the FFT based method. We form QHT
between setup time and computation time. If we have to first, then form the matrix A and solve the system of equa-
solve the linear inverse problem for multiple right-hand sides, tions using a direct solver.
i.e., for multiple measurements, then it is worthwhile to com- [59] 2. Strategy 2 is that the matrix-vector products are
pute the preconditioner with larger r that will result in faster computed using the FFT based method. The rest of the
convergence. algorithm is the same as the one described in section 3.
[60] 3. Strategy 3 is that the authors use the algorithm
5.2.3. Spectrum of the Preconditioned Matrix described in section 3, with matrix vector products using
[56] To understand the working of the preconditioner, H-matrix approach.
we plot the spectrum of the preconditioned operator. Since [61] The first strategy is the direct implementation of FFT
it is extremely expensive to compute the full spectrum of based methods that is prevalent in the literature. We note
the operator, we restrict ourselves to the case when the that it can be expensive because it involves constructing H,
number of sensors are 10 10 and the number of spatially which we have already discussed can be extremely expen-
varying unknowns is at most 100 100. In setting up the sive. Thus, we propose a second alternative which is to use
preconditioner, we explicitly compute 100 dominant eigen- our original algorithm but modify it such that the matrix-
values and their corresponding eigenvectors. Even though vector products are computed using FFT based approaches
we request only a 100 eigenvalues from the SLEPc pack- as opposed to the H-matrix approach. These two strategies
age, in practice a few more eigenvalues converge (135) in were implemented using MATLAB. This might not be a fair
the Krylov-Schur iterations and so we use these in our comparison because we are comparing two different pro-
preconditioner as well. In Figure 6, we plot the absolute gramming languages and programming styles.
eigenvalues of the preconditioned operator. As can be seen [62] Table 6 compares the time taken to setup and solve
in Figure 6, most of the eigenvalues seem to cluster around the system of equations for the three different strategies. As
1 and the spectrum is more or less independent of the expected, the bulk of the computational time in Strategy 1
Figure 6. (left) Eigenvalues of the preconditioned operator. (right) Plot of #f i : j i 1j > xg against
x for all eigenvalues. For these plots, we assumed the number of observations to be 10 10 10, the
number of unknowns varied from 20 20 to 100 100. The covariance kernel is defined in equation (22).
11 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
12 of 13
19447973, 2012, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2011WR011778 by Wuhan University, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
W05522 SAIBABA AND KITANIDIS: EFFICIENT GEOSTATISTICAL APPROACH W05522
consistent with the error due to truncation of the Chebyshev Franke, R. (1982), Scattered data interpolation: Tests of some method,
polynomial approximation to the square root of the matrix, Math. Comput., 38(157), 181–200.
Freund, R. W. (1993), A transpose-free quasi-minimal residual algorithm
the criterion for choosing is for non-hermitian linear systems, SIAM J. Sci. Comput., 14, 470.
Fritz, J., I. Neuweiler, and W. Nowak (2009), Application of FFT-based
ðnc ; max Þ algorithms for large-scale universal kriging problems, Math. Geosci.,
: (A2) 41(5), 509–533.
L max Golub, G. H., and C. F. Van Loan (1996), Matrix Computations, vol. 3,
Johns Hopkins Univ., Baltimore, Md.
In practice, ¼ 109 easily satisfies this criteria even for Grasedyck, L., and W. Hackbusch (2003), Construction and arithmetics of
small values of ðnc ; max Þ. h-matrices, Computing, 70, 2003.
Greengard, L., and V. Rokhlin (1987), A fast algorithm for particle simula-
tions, J. Comput. Phys., 73(2), 325–348.
Hager, W. W. (1989), Updating the inverse of a matrix, SIAM Rev., 31(2),
[68] Acknowledgment. The authors were supported by NSF Award
0934596, Subsurface Imaging and Uncertainty Quantification. 221–239.
Hernandez, V., J. E. Roman, and V. Vidal (2005), Slepc: A scalable and
flexible toolkit for the solution of eigenvalue problems, ACM Trans.
Math. Software, 31(3), 351–362.
References Kitanidis, P. K. (1995), Quasilinear geostatistical theory for inversing,
Akcelik, V., G. Biros, O. Ghattas, K. R. Long, and B. B. Waanders (2003), Water Resour. Res., 31(10), 2411–2419.
A variational finite element method for source inversion for convective- Kitanidis, P. K. (2007), Bayesian and Geostatistical Approaches to Inverse
diffusive transport, Finite Elem. Anal. Des., 39(8), 683–705. Problems, in On Stochastic Inverse Modeling, Geophys. Monogr., vol.
Akcelik, V., G. Biros, A. Draganescu, J. Hill, O. Ghattas, and B. V. B. 171, ed. by L. Biegler, et al., pp. 19–30, AGU, Washington, D. C.
Waanders (2005), Dynamic data-driven inversion for terascale simula- Kitanidis, P. K. (2010), Bayesian and Geostatistical Approaches to Inverse
tions: Real-time identification of airborne contaminants, in Proceedings Problems, pp. 71–85. John Wiley, New York.
of the 2005 ACM/IEEE Conference on Supercomputing, p. 43, IEEE Li, W., and O. A. Cirpka (2006), Efficient geostatistical inverse methods
Comput. Soc. Press, Washington, D.C. for structured and unstructured grids, Water Resour. Res., 42, W06402,
Balay, S., W. D. Gropp, L. Curfman McInnes, and B. F. Smith (1997), doi:10.1029/2005WR004668.
Efficient management of parallelism in object oriented numerical soft- Mason, J. C., and D. C. Handscomb (2003), Chebyshev Polynomials, CRC
ware libraries, in Modern Software Tools in Scientific Computing, edited Press, Boca Raton, FL.
by E. Arge, A. M. Bruaset, and H. P. Langtangen, pp. 163–202, Matheron, G. (1973), The intrinsic random functions and their applications,
Birkhäuser Press, Boston, MA. Adv. Appl. Prob., 5(3), 439–468.
Balay, S., K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Michalak, A. M., and P. K. Kitanidis (2003), A method for enforcing pa-
Knepley, L. Curfman McInnes, B. F. Smith, and H. Zhang (2008), PETSc rameter nonnegativity in bayesian inverse problems with an application
Users Manual, Revision 3.0.0, Tech. Rep. ANL-95/11, Argonne Natl. Lab., to contaminant source identification, Water Resour. Res., 39(2), 1033,
Lemont, IL. doi:10.1029/2002WR001480.
Barnes, J., and P. Hut (1986), A hierarchical O(N log N) force-calculation Nowak, W., and O. A. Cirpka (2006), Geostatistical inference of hydraulic
algorithm, Nature, 324, 4. conductivity and dispersivities from hydraulic heads and tracer data,
Batu, V. (1998), Aquifer Hydraulics: A Comprehensive Guide to Hydro- Water Resour. Res., 42, W08416, doi:10.1029/2005WR004832.
geologic Data Analysis, John Wiley, New York. Nowak, W., S. Tenkleve, and O. A. Cirpka (2003), Efficient computation of
Bebendorf, M. (2000), Approximation of boundary element matrices, linearized cross-covariance and auto-covariance matrices of interdepend-
Numer. Math., 86(4), 565–589. ent quantities, Math. Geol., 35(1), 53–66.
Bebendorf, M. (2005), Hierarchical LU decomposition-based precondi- Paige, C. C., and M. A. Saunders (1975), Solution of sparse indefinite sys-
tioners for BEM, Computing, 74(3), 225–247. tems of linear equations, SIAM J. Numer. Anal., 12(4), 617–629.
Bebendorf, M. (2008), Hierarchical Matrices: A Means to Efficiently Solve Rjasanow, S., and O. Steinbach (2007), The Fast Solution of Boundary Inte-
Elliptic Boundary Value Problems, vol. 63, Lecture Notes Comput. Sci. gral Equations. Mathematical and Analytical Techniques with Applica-
Eng., Springer, New York. tions to Engineering, Springer, New York.
Bebendorf, M., and S. Rjasanow (2003), Adaptive low-rank approximation Rubin, Y., and S. S. Hubbard (2005), Hydrogeophysics, Springer, New York.
of collocation matrices, Computing, 70(1), 1–24. Saad, Y., and M. H. Schultz (1986), GMRES: A generalized minimal resid-
Benzi, M., G. H. Golub, and J. Liesen (2005), Numerical solution of saddle ual method for solving nonsymmetric linear systems, SIAM J. Sci. Stat.
point problems, Acta Numer., 14, 1–137. Comput., 7(3), 856–869.
Börm, S., L. Grasedyck, and W. Hackbusch (2003), Introduction to hier- Stein, M. L. (1999), Interpolation of Spatial Data: Some Theory for Kriging,
archical matrices with applications, Eng. Anal. Boundary Elem., 27(5), Springer, New York.
405–422. Wihler, T. P. (2009), On the hölder continuity of matrix functions for nor-
Cardiff, M., W. Barrash, P. K. Kitanidis, B. Malama, A. Revil, S. Straface, mal matrices, J. Inequalities Pure Appl. Math., 10, 1–5.
and E. Rizzo (2009), A potential-based inversion of unconfined steady- Ying, L., G. Biros, and D. Zorin (2004), A kernel-independent adaptive fast
state hydraulic tomography, Ground Water, 47(2), 259–270. multipole algorithm in two and three dimensions, J. Comput. Phys.,
Christakos, G. (1984), On the problem of permissible covariance and vario- 196(2), 591–626.
gram models, Water Resour. Res., 20(2), 251–265. Zanini, A., and P. K. Kitanidis (2009), Geostatistical inversing for large-
Dietrich, C. R., and G. N. Newsam (1995), Efficient generation of conditional contrast transmissivity fields, Stochastic Environ. Res. Risk Assess.,
simulations by chebyshev matrix polynomial approximations to the sym- 23(5), 565–577.
metric square root of the covariance matrix, Math. Geol., 27(2), 207–228.
Flath, H. P., L. C. Wilcox, V. Akc
elik, J. Hill, B. van Bloemen Waanders, and P. K. Kitanidis and A. K. Saibaba, Institute for Computational and Math-
O. Ghattas (2011), Fast algorithms for bayesian uncertainty quantification ematical Engineering, Jen-Hsun Huang Engineering Center, Stanford
in large-scale linear inverse problems based on low-rank partial hessian University, 475 Via Ortega, Stanford, CA 94305-4121, USA. (arvindks@
approximations, SIAM J. Sci. Comput., 33(1), 407–432. stanford.edu)
13 of 13