Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/258462630

Sparse Downscaling and Adaptive Fusion of Multi-sensor Precipitation

Article · December 2011

CITATIONS READS
0 115

2 authors, including:

Ardeshir M. Ebtehaj
University of Minnesota Twin Cities
56 PUBLICATIONS   469 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Global Precipitation Measurement View project

All content following this page was uploaded by Ardeshir M. Ebtehaj on 26 April 2014.

The user has requested enhancement of the downloaded file.


WATER RESOURCES RESEARCH, VOL. 49, 5944–5963, doi:10.1002/wrcr.20424, 2013

On variational downscaling, fusion, and assimilation of


hydrometeorological states: A unified framework via regularization
A. M. Ebtehaj1,2 and E. Foufoula-Georgiou1
Received 2 October 2012; revised 10 May 2013; accepted 13 July 2013; published 23 September 2013.

[1] Improved estimation of hydrometeorological states from down-sampled observations


and background model forecasts in a noisy environment has been a subject of growing
research in the past decades. Here we introduce a unified variational framework that ties
together the problems of downscaling, data fusion, and data assimilation as ill-posed inverse
problems. This framework seeks solutions beyond the classic least squares estimation
paradigms by imposing a proper regularization, expressed as a constraint consistent with the
degree of smoothness and/or probabilistic structure of the underlying state. We review
relevant smoothing norm regularization methods in derivative space and extend classic
formulations of the aforementioned problems with particular emphasis on land surface
hydrometeorological applications. Our results demonstrate that proper regularization of
downscaling, data fusion, and data assimilation problems can lead to more accurate and
stable recovery of the underlying non-Gaussian state of interest with improved performance
in capturing isolated and jump singularities. In particular, we show that the Huber
regularization in the derivative space offers advantages, compared to the classic solution
and the Tikhonov regularization, for spatial downscaling and fusion of non-Gaussian
multisensor precipitation data. Furthermore, we explore the use of Huber regularization in a
variational data assimilation experiment while the initial state of interest exhibits jump
discontinuities and non-Gaussian probabilistic structure. To this end, we focus on the heat
equation motivated by its fundamental application in the study of land surface heat and
mass fluxes.
Citation : Ebtehaj, A. M., and E. Foufoula-Georgiou (2013), On variational downscaling, fusion, and assimilation of
hydrometeorological states: A unified framework via regularization, Water Resour. Res., 49, 5944–5963, doi:10.1002/wrcr.20424.

1. Introduction enhancing the resolution of a measured or modeled state of


interest by producing a fine-scale representation of that
[2] In parallel to the growing technologies for earth state with reduced uncertainty; (2) data fusion (DF), to pro-
remote sensing, we have witnessed an increasing interest to duce an improved estimate from a suite of noisy observa-
improve the accuracy of observations and integrate them tions at different scales ; and (3) data assimilation (DA),
with predictive models for enhancing our environmental which deals with estimating initial conditions in a predic-
forecast skills. Remote sensing observations are typically tive model consistent with the available observations and
noisy and coarse-scale representations of a true state vari- the underlying model dynamics. In this paper, we revisit
able of interest, lacking sufficient details for fine-scale the problems of downscaling, data fusion, and data assimi-
environmental modeling. In addition, environmental pre- lation focusing on a common thread between them as varia-
dictions are not perfect as models often suffer either from tional ill-posed inverse problems. Proper regularization and
inadequate characterization of the underlying physics or solution methods are proposed to efficiently handle large-
inaccurate initialization. Given these limitations, several scale data sets while preserving key statistical and geomet-
classes of estimation problems present themselves as con- rical properties of the underlying field of interest, namely,
tinuous challenges for the atmospheric, hydrologic, and non-Gaussian and structured variability in real or trans-
oceanic science communities. These include (1) downscal- formed domains. Here, we only examine a few hydrome-
ing (DS), which refers to the class of problems for teorological inverse problems with particular emphasis on
land-surface applications.
1
Department of Civil Engineering, University of Minnesota, Minneapo- [3] In land-surface hydrologic studies, DS of precipita-
lis, Minnesota, USA. tion and soil moisture observations has received consider-
2
School of Mathematics, University of Minnesota, Minneapolis, Minne- able attention, using a relatively wide range of
sota, USA.
methodologies. DS methods in hydrometeorology and cli-
Corresponding author: E. Foufoula-Georgiou, Department of Civil mate studies generally fall into three main categories,
Engineering, Saint Anthony Falls Laboratory, University of Minnesota,
Minneapolis, MN 55414, USA. (efi@umn.edu)
namely, dynamic downscaling, statistical downscaling, and
variational downscaling. Dynamic downscaling often uses
©2013. American Geophysical Union. All Rights Reserved. a regional physically based model to reproduce fine-scale
0043-1397/13/10.1002/wrcr.20424 details of the state of interest consistent with the large-scale

5944
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

observations or outputs of a global circulation model [e.g., Georgiou [2011a] proposed a fusion methodology in the
Reichle et al., 2001a; Castro et al., 2005; Zupanski et al., wavelet domain to merge TRMM-PR and ground-based
2010]. Statistical downscaling methods encompass a large NEXRAD measurements, aiming to preserve the non-
group of methods that typically use empirical multiscale Gaussian structure and local extremes of precipitation fields.
statistical relationships, parameterized by observations or [5] Data assimilation has played an important role in
other environmental predictors, to reproduce realizations of improving the skill of environmental forecasts and has
fine-scale fields. Precipitation and soil moisture statistical become by now a necessary step in operational predictive
downscaling has been mainly approached via spectral and models [see Daley, 1993]. Data assimilation amounts to
(multi)fractal interpolation methods, capitalizing on the integrating the underlying knowledge from the observa-
presence of a power law spectrum and a statistical self-sim- tions into the first guess or the background state, typically
ilarity/self-affinity in precipitation and soil moisture fields provided by a physical model from the previous forecast
[Lovejoy and Mandelbrot, 1985; Lovejoy and Schertzer, step. The goal is then to obtain an improved estimate of the
1990; Gupta and Waymire, 1993; Kumar and Foufoula- current state of the system with reduced uncertainty, the so-
Georgiou, 1993; Perica and Foufoula-Georgiou, 1996; called analysis. The analysis is then used to forecast the
Veneziano et al., 1996; Wilby et al., 1998a, 1998b; Deidda, state at the next time step and so on (see Daley [1993] and
2000; Kim and Barros, 2002; Rebora et al., 2005; Badas Kalnay [2003] for a comprehensive review). One of the
et al., 2006; Merlin et al., 2006; among others]. In varia- most common approaches to the data assimilation problem
tional approaches, a direct cost function is defined whose relies on variational techniques [e.g., Sasaki, 1958; Lorenc,
optimal point is the desired fine-scale field which can be 1986; Talagrand and Courtier, 1987; Courtier and Tala-
obtained via using an optimization method. Recently along grand, 1990; Parrish and Derber, 1992; Zupanski, 1993;
this direction, Ebtehaj et al. [2012] cast the rainfall DS Courtier et al., 1994; Reichle et al., 2001b; Margulis and
problem as an inverse problem using sparse regularization Entekhabi, 2003; among many others]. In these methods,
to address the intrinsic rainfall singularities and non- one explicitly defines a cost function, typically quadratic,
Gaussian statistics. This variational approach belongs to whose unique minimizer is the analysis state. On the other
the class of methodologies presented and extended in this hand, very recently, Freitag et al. [2012] proposed a regu-
paper. larized variational data assimilation scheme to improve
[4] The DF problem has also been a subject of continu- assimilation results in advection-dominated flow in the
ous interest in the precipitation science community mainly presence of sharp weather fronts.
due to the availability of rainfall measurements from multi- [6] The common thread in the DS, DF, and DA problems
ple spaceborne (e.g., TRMM and GOES satellites) and is that, in all of them, we seek an improved estimate of the
ground-based sensors (e.g., the NEXRAD network and rain true state given a suite of noisy and down-sampled observa-
gauges). The accuracy and space-time coverage of tions and/or uncertain model-predicted states. Specifically,
remotely sensed rainfall are typically conjugate variables. let us suppose that the unknown true state in continuous
In other words, more accurate observations are often avail- space is denoted by x(t) and its indirect observation (or
able with lower space-time coverage and vice versa. For model output), by y(r). Let us also assume that x(t) and y(r)
instance, low-orbit microwave sensors provide more accu- are related via a linear integral equation, called the Fred-
rate observations but with less space-time coverage com- holm integral equation of the first kind, as follows:
pared to the high-orbit geo-stationary infrared (GOES-IR) Z 1
sensors. Moreover, there are often multiple instruments on Hðr; tÞxðtÞdt ¼ yðrÞ; 0  r  1; ð1Þ
a single satellite (e.g., precipitation radar and microwave 0

imager on TRMM), each of which measures rainfall with where Hðr; tÞ is the known kernel relating x(t) and y(r).
different footprints and resolutions. A wide range of meth- Recovery of x(t) knowing y(r) and Hðr; tÞ is a classic linear
odologies, including weighted averaging, regression, filter- inverse problem. Clearly, the deconvolution problem is a
ing, and neural networks, has been applied to combine very special case with the kernel of the form Hðr  tÞ,
microwave and Geo-IR rainfall signals [e.g., Adler et al., which in its discrete form plays a central role in this paper.
2003; Huffman et al., 1995; Sorooshian et al., 2000; Huff- Linear inverse problems are by nature ill-posed, in the
man et al., 2001; Hong et al., 2004; Huffman et al., 2007]. sense that they do not satisfy at least one of the following
Furthermore, a few studies have addressed methodologies three conditions: (1) existence, (2) uniqueness, and (3) sta-
to optimally combine the products of the TRMM precipita- bility of the solution. For instance, when due to the kernel
tion radar (PR) with the TRMM microwave imager (TMI) architecture, the dimension of the observation is smaller
using Bayesian inversion and weighted least squares than that of the true signal, infinite choices of x(t) may lead
(WLS) approaches [e.g., Masunaga and Kummerow, 2005; to the same y(r) and there is no unique solution for the
Kummerow et al., 2010]. From another direction, Gaussian problem. For the case when y(r) is noisy and has a larger
filtering methods on Markovian tree-like structures, the so- dimension than the true state, the solution is typically very
called scale recursive estimation (SRE), have been pro- unstable because the high-frequency components in y(r) are
posed to merge spaceborne and ground-based rainfall typically amplified and spoil the solution in the inversion
observations at multiple scales [e.g., Gorenburg et al., process. A common approach to make an inverse problem
2001; Tustison et al., 2003; Bocchiola, 2007; Van de well posed is via the so-called regularization methods [e.g.,
Vyver and Roulin, 2009; Wang et al., 2011], see also Hansen, 2010]. The goal of regularization is to properly
Kumar [1999] for soil moisture applications. Recently, constrain the inverse problem aiming to obtain a unique
using the Gaussian-scale mixture probability model and an and sufficiently stable solution. The choice of regulariza-
adaptive filtering approach, Ebtehaj and Foufoula- tion typically depends on the continuity and degree of

5945
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

smoothness of the state variable of interest, often called the rainfall downscaling are presented in this section by taking
regularity condition. For instance, some state variables or into account the specific regularity and statistical distribu-
environmental fluxes are very regular with high degree of tion of the rainfall fields in the derivative space. Section 4
smoothness and differentiability (e.g., pressure), while is devoted to the regularized DF class of problems with
others might be more irregular and suffer from frequent examples and results on remotely sensed rainfall data. The
and different sorts of discontinuities (e.g., rainfall). In fact, regularized DA problem is discussed in section 5. Conclud-
it can be shown that the proper choices of regularization ing remarks and future research perspectives are presented
not only yield unique and stable solutions but also reinforce in section 6. The important duality between regularization
the underlying regularity of the true state in the solution. It and its statistical interpretation is further presented in Ap-
is important to note that different regularity conditions are pendix A, while Appendix B is devoted to algorithmic
theoretically consistent with different statistical signatures details important for implementation of the proposed
in the true state, a fact that may guide proper design of the methodologies.
regularization, as explored in this study.
[7] The central goal of this paper is to propose a unified 2. Discrete Inverse Problems: Conceptual
framework for the class of DS, DF, and DA problems by Framework
recasting them as discrete linear inverse problems using a
relevant regularization in the derivative space, aiming to [10] In this section, we briefly explain the conceptual
solve them more accurately compared to the classic key elements of discrete linear inverse estimation relevant
weighted least squares (WLS) formulations. From a statisti- to the problems at hand and leave further details for the
cal standpoint, the main motivation is to explicitly incorpo- next sections. Analogous to equation (1), linear discrete
rate non-Gaussianity of the underlying state in the inverse problems typically amount to estimating the true
derivative domain as a prior knowledge to obtain an high-resolution m-element state vector x 2 Rm from the
improved estimate of jump and isolated extreme variabil- following observation model:
ities in the time-space structure of the hydrometeorological
state of interest. Note that the proposed framework relies y ¼ Hx þ v; ð2Þ
on the seminal works by, for example, Tibshirani [1996],
Chen et al. [2001], Candes and Tao [2006], and recent where y 2 Rn denotes the observations (e.g., output of a
developments in mathematical formalisms of inverse prob- sensor), H 2 Rnm is an n  m observation operator which
lems [e.g., Hansen, 2010; Elad, 2010], which have maps the state space onto the observation space, and v 
received a great deal of attention in statistical regression N ð0; RÞ is the Gaussian error in Rn . Note that the observa-
and image processing, but are relatively new to the com- tion operator, which is a discrete representation of the ker-
munities of hydrologic and atmospheric sciences. To the nel in equation (1), and the noise covariance are supposed
best of our knowledge, in these areas, the only studies that to be known or properly calibrated. Depending on the rela-
explore these methodologies are Ebtehaj et al. [2012] and tive dimension of y and x, this linear system can be under
Freitag et al. [2012] for rainfall downscaling and data determined ðm  nÞ or overdetermined ðm  nÞ. In the
assimilation of sharp fronts, respectively. under-determined case, there are infinite different x’s that
[8] The presented methodologies for the DS and DF satisfy equation (2), while for the overdetermined case a
problems are examined through downscaling and data unique solution may not exist. As is evident, the DS prob-
fusion of remotely sensed rainfall observations, which have lem belongs to the class of under-determined systems
fundamental applications in flash flood predictions, espe- because the sensor output is a coarse-scale and noisy repre-
cially in small watersheds [Rebora et al., 2005; Siccardi sentation of the true state. However, the class of DF and
et al., 2005; Rebora et al., 2006]. We show that the pre- DA problems falls into the category of overdetermined sys-
sented methodologies allow us to improve the quality of tems, as the total size of the observations and background
rainfall estimation and reduce estimation uncertainty by state exceeds the dimension of the true state.
recovering the small-scale high-intensity rainfall extreme [11] In each of the above cases, we may naturally try to
features, which have been lost in the low-resolution sam- obtain a solution with minimum error variance by solving a
pling of the sensor. For the DA family of problems, the linear WLS problem. However, for the under-determined
promise of the presented framework is demonstrated via an case the solution still does not exist, while for the overde-
elementary example using the heat equation, which plays a termined case it is commonly ill-conditioned and sensitive
key role in the study of land surface heat and mass fluxes to the observation noise (see section 4). Therefore, the min-
[e.g., Peter-Lidard et al., 1997; Liang et al., 1999]. The imum variance WLS treatment cannot properly make the
results demonstrate that the accuracy of the analysis and above inverse problems well posed. To obtain a unique and
forecast cycles in a DA problem can be markedly stable solution, the basic idea of regularization is to further
improved, compared to the classic variational methods, constrain the solution. For instance, among many solutions
especially when the initial state exhibits different forms of that fit the observation model in equation (2), we can obtain
discontinuities. the one with minimum energy, mean-squared curvature, or
[9] Section 2 provides conceptual insight into the dis- total variation. The choice of this constraint or regulariza-
crete inverse problems. Section 3 describes the DS problem tion highly depends on a priori knowledge about the
in detail, as a primitive building block for the other studied underlying regularity of x. For sufficiently smooth x, we
problems. Important classes of regularization methods are naturally may promote a solution with minimum mean-
explained and their statistical interpretation is briefly dis- squared curvature to impose the desired smoothness on the
cussed from the Bayesian point of view. Examples on solution. However, if the state is nonsmooth and contains

5946
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

frequent jumps and discontinuities, a solution with mini- above regularization term minimizes the energy in the de-
mum total variation might be a better choice. In subsequent rivative space, which naturally imposes extra smoothness
sections, we explain these concepts in more detail for the on the solution.
DS, DF, and DA problems with examples relevant to some [15] Depending on the intrinsic regularity of the underly-
land-surface hydrometeorological problems. ing state and the selected L, other choices of the regulariza-
tion term are also common. For example, in the case when
3. Regularized Downscaling the L projects a major part of the state vector onto (near)
zero values, the preferred choice is the ‘1-norm regulariza-
3.1. Problem Formulation tion [e.g., Tibshirani, 1996; Chen et al., 1998, 2001]. Such
[12] To put the DS problem in a linear inverse estimation a property is often called sparse representation in the L
framework, we recognize that in the observation model of space and gives rise to the following formulation of the
equation (2), the true high-resolution (HR) state x 2 Rm regularized DS problem:
has a larger dimension than the low-resolution (LR) obser-  
vation vector y 2 Rn , that is, m  n. Throughout this 1
m x^ ¼ argmin jjy  Hxjj2R1 þ jjLxjj1 ; ð6Þ
work, a notation is adopted in which the vector p x ffiffi2
ffi pRffiffiffi x 2
may also represent, for example, a 2-D field X 2 R m m , X
which is vectorized in a fixed order (e.g., lexicographical). where the ‘1-norm is jjxjj1 ¼ jx j. By choosing L as a
[13] As explained in the previous section, the DS prob- i i
derivative operator in equation (6), in effect we minimize a
lem naturally amounts to obtaining the best WLS estimate
measure of total variation of the state of interest. It is well
x^ of the HR or fine-scale true state as follows:
understood that in this case, we typically better recover dis-
  continuities and local jump singularities compared to the
1 2
x^ ¼ argmin jjy  HxjjR1 ; ð3Þ ‘2-norm regularization in the derivative domain. Note that,
x 2 contrary to the Tikhonov regularization in equation (4), the
‘1-norm regularization is a nonsmooth convex optimization
where jjxjj2A ¼ xT Ax denotes the quadratic norm, while A as the regularization term is nondifferentiable and the con-
is a positive definite matrix. Due to the ill-posed nature of ventional iterative gradient descent methods are no longer
the problem, this optimization does not have a unique solu- applicable in their standard forms.
tion, as setting
 the derivative
 of the cost function to zero, [16] One of the common approaches to treat the nondif-
the Hessian HT R1 H is definitely singular. To narrow ferentiability in equation (6) is to replace the ‘1-norm with
down all possible solutions to a stable and unique one, a a smooth X approximation, the so-called Huber norm,
common choice is to regularize the problem by constrain- jjxjjHub ¼  ðx Þ, where
i T i
ing the squared Euclidean norm of the solution to be less 
x2 jxj  
than a certain constant, that is, jjLxjj22  const :, where L is  T ðx Þ ¼ ; ð7Þ
 ð2jxj   Þ jxj > 
an appropriately chosen transformation and jjxjj22 ¼
X and  denotes a nonnegative threshold (Figure 1). The
jx j2 denotes the Euclidean ‘2-norm. Note that, by put-
i i Huber norm is a hybrid norm that behaves similarly to the
ting a constraint on the Euclidean norm of the state, we not ‘1-norm for values greater than the threshold  while for
only narrow down the solutions but also implicitly suppress smaller values it is identical to the ‘2-norm. From the statis-
the large components of the inverted noise and reduce their tical regression point of view, the sensitivity of a norm as a
spoiling effect on the solution. penalty function to the outliers depends on the (relative)
[14] Using the theory of Lagrange multipliers, the dual values of the norm for large residuals. If we restrict our-
form of the constrained version of the optimization in equa- selves to convex norms, the least sensitive ones to the large
tion (3) is residuals or say the outliers are those with linear behavior
 
1 for large input arguments (i.e., ‘1 and Huber). Because of
x^ ¼ argmin jjy  Hxjj2R1 þ jjLxjj22 ; ð4Þ this property, these norms are often called robust norms
x 2
[Huber, 1964, 1981; Boyd and Vandenberghe, 2004].
where  > 0 is the Lagrange multiplier or the so-called reg- Throughout this paper, for solving equation (6), we use the
ularizer. This problem is a smooth convex quadratic pro- Huber relaxation due to its simplicity, efficiency, and adap-
gramming problem and is known as the Tikhonov tivity to all of the concerning classes of DS, DF, and DA
regularization with the following unique analytical problems. This issue is further discussed in Appendix B.
solution: [17] In general, the first term in equations (4) and (6)
 1 measures how well the solution approximates the given
x^ ¼ HT R1 H þ 2LT L HT R1 y; ð5Þ (noisy) data, while the second term imposes a specific regu-
larity on the solution. In effect, the regularizer plays a
provided that LT L is positive definite [Tikhonov et al., trade-off role between making the fidelity to the observa-
1977; Hansen, 1998; Golub et al., 1999; Hansen, 2010]. tions sufficiently large, while not imposing too much regu-
As is evident, the L transformation also plays a key role in larity (degree of smoothness) on the solution. The smaller
the solution of the regularized DS problem. For instance, the value of , the more weight is given to fitting the
choosing an identity matrix in equation (4) implies that we (noisy) observations which typically results in solutions
are looking for a solution with the smallest Euclidean norm that are less regular and prone to overfitting. On the other
(energy), while if L represents a derivative operator, the hand, the larger the value of , the more weight is given to

5947
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

regularization methods, we need to have proper mathemati-


cal models for the downgrading operator and also a priori
knowledge about the form of the regularization term.
Clearly, in the presented framework, the downgrading op-
erator needs to be a linear approximation of the sampling
property of the sensor. If a sensor directly measures the
state of interest while its maximum frequency channel is
smaller than the maximum frequency content of the state
(e.g., precipitation), the result of the sensing would be a
smoothed and possibly down-sampled version of the true
state. Thus, each element of the observed state in a grid
scale might be considered as an LR representation of the
true state, lacking the HR subgrid scale variability. To have
a simple and tractable mathematical model, the downgrad-
ing matrix might be considered translation invariant and
decomposed into H ¼ DC, where C encodes the smoothing
Figure 1. The Huber penalty is a smooth relaxation of effect and D contains information about the sampling rate
the ‘1-norm which acts quadratically for input values of the sensor. To this end, let us suppose that each grid
smaller than the threshold , while it behaves linearly for point in the LR observation is a (weighted) average of a fi-
larger inputs. For heavy-tailed inputs, linear penalization in nite size neighborhood of the true HR state around the cen-
the regularization term is advantageous compared to the ter of the grid. In this case, the sensor smoothing property
quadratic penalization in which the overall cost function in C can be encoded by the filtering and convolution opera-
becomes dominated by a few large values in the tail of the tions, while D acts as a linear operator to simulate down-
distribution. sampling properties of the sensor (Figure 2). Note that
these matrices can be formed explicitly, while direct
matrix-vector multiplication (e.g., Cx and CTx, x 2 Rm )
the regularization term which may result in a biased and requires a computational cost in the order of Oðm2 Þ. How-
overly smooth solution. Clearly, the goal is to find a bal- ever, for large-scale problems, we do not need to explicitly
ance between the two terms such that the solution is suffi- perform these matrix-vector multiplications as there are ef-
ciently close to the observations while obeying the ficient algorithms such as the fast Fourier transformation
underlying degree of regularity. [Cooley and Tukey, 1965] that can perform convolution
[18] It is important to note that, under the assumption of operations with computational cost of Oðm log mÞ.
Gaussian error, the WLS problem (3) can be viewed as the [20] As is evident, the smoothing kernel needs to be esti-
maximum likelihood (ML) estimator of the HR field. On mated for each sensor, possibly by learning from a library
the other hand, the regularized problems (4) and (6) can be of coincidental HR and LR observations or through a direct
viewed as the Bayesian maximum a posteriori (MAP) esti- minimization of an associated cost [e.g., Ebtehaj et al.,
mator of the HR field. Indeed, the regularization terms refer 2012]. In the absence of prior knowledge, one possible
to the prior knowledge about the probabilistic distribution choice is to assume that the sensor observes a coarse
of the state of interest. In other words, in equations (4) and grained (i.e., nonoverlapping box averaging) and noisy ver-
(6), we implicitly assume that under the chosen transforma- sion of the true state. In other words, to produce a field at
tion L, the state of interest can be well explained
 by the
 the grid scale of sc  sc from a 1  1, this assumption is
family of multivariate Gaussian pðxÞ / exp jjLxjj22 equivalent to selecting a uniform smoothing kernel of size
and Laplace pðxÞ / exp ðjjLxjj1 Þ densities, respectively. sc  sc, followed by a down-sampling operation with ratio
Similarly, selecting the HuberX norm can also be interpreted sc (Figure 3a).
[21] The error covariance matrix R in the observation
as assuming that log pðxÞ /  ðx Þ, which is equiva-
i T i model (2) plays a very important role on the results of the
lent to considering the Gibbs density function as the prior DS problem from both the mathematical and practical per-
probability model [Geman and Geman, 1984; Schultz and spectives. Mathematically speaking, when the error is spa-
Stevenson, 1994] (see Appendix A for details). The equiva- tially white, the error covariance matrix is diagonal without
lence between the regularization, which imposes con- any smoothing effect on the result [e.g., Gaspari and Cohn,
straints on the regularity of the solution, and its Bayesian 1999]; however, spatially correlated observation errors
interpretation, which takes into account the prior probabil- give rise to smoother results. Moreover, correlated errors
istic knowledge about the state of interest, is very insight- with finite correlation length give rise to band error
ful. This relationship establishes an important duality covariance matrices, which are prone to ill conditioning.
which can guide the selection of the regularization method This ill-conditioning is typically more severe in the case of
depending on the statistical properties of the state of inter- ensemble error covariance estimation when the number of
est in the real or derivative space. samples is typically much smaller than the observational
dimension of the problem [e.g., Ledoit and Wolf, 2004].
3.2. Application in Rainfall Downscaling Practically speaking, this error term captures the instrumen-
3.2.1. Problem Formulation tal (e.g., ground-based NEXRAD radar) error. Although
[19] As is evident, to downscale a remotely sensed practical characterization of this error term is not in the
hydrometeorological state, using the explained discrete scope of this study, for operational purposes this term needs

5948
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 2. Two-dimensional mathematical models for the smoothing and down-sampling properties of
an LR sensor via the convolution operation. (a) A simple representation of an observation model for a
neighborhood of size 3  3 using a simple smoothing (averaging) observation operator. (b and c) A sam-
ple effect of the filtering operation (C) and its transpose (CT) on a discrete 2-D unit pulse, given the 3 
3 kernel on the left. (d) A sample effect of the 2-D down-sampling operator (D) and its transpose (DT)
with scaling ratio 2.

to be properly estimated and calibrated based on observa- form pðxÞ / exp ðjxjÞ. It is seen that the analyzed rain-
tional and theoretical studies [e.g., Ciach and Krajewski, fall image exhibits (nearly) sparse representation in the de-
1999; Hossain and Anagnostou, 2005, 2006; Krajewski et rivative space with a large mass around zero and heavier
al., 2011; Maggioni et al., 2012; AghaKouchak et al., 2012]. tail than the Gaussian.
[22] The choice of the regularization term also plays a [23] This well-behaved non-Gaussian structure in the de-
very important role on the accuracy of the DS solution. Fig- rivative space mainly arises due to the presence of spatial
ure 4a demonstrates a NEXRAD reflectivity snapshot (re- coherent and correlated patterns in the rainfall fields which
solution of 1  1 km) over the Texas TRMM satellite contain sharp transitions (large gradients) and isolated sin-
ground validation site, while Figure 4b displays the stand- gularities (high-intensity rain cells). In effect, over the large
ardized histogram of the discrete Laplacian coefficients areas of almost uniform rainfall reflectivity values, a mea-
(second-order differences) and the fitted exponential of the sure of derivative translates those values into a large num-
ber of (near) zero coefficients; however, over the less
frequent jumps and isolated high-intensity rain cells, deriv-
ative coefficients are markedly larger than zero and form
the tails. Note that this non-Gaussianity is due to the intrin-
sic spatial structure of rainfall fields and cannot be resolved
by a logarithmic or power law transformation (e.g., Z-R
relationship). It is seen that after applying a relevant Z-R
relationship on the reflectivity fields, the shape of the rain-
Figure 3. (a) A uniform smoothing (low pass) kernel of fall histogram remains non-Gaussian and still can be
size sc  sc. (b) The discrete (high pass) generalized Lapla- approximated by the Laplace density (not shown here).
cian filter of size 3  3, where  is a parameter ranging [24] The universality of this statistical structure in the
between 0 and 1. The Laplacian coefficients, obtained by distribution of derivative coefficients has been observed in
filtering the 2-D state with the Laplacian kernel, are ap- many rainfall reflectivity fields [Ebtehaj and Foufoula-
proximate measures of the second-order derivative. Georgiou, 2011b], denoting that the choice of the Laplace
Throughout this paper, we choose  ¼ 0.5, which corre- prior and ‘1-norm regularization is preferred in the rainfall
sponds to the standard second-order differencing operation. DS problems rather than the choice of the Tikhonov

5949
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 4. A rainfall reflectivity field and the distribution of its standardized Laplacian coefficients,
Lxn ¼ Lx=std ðLxÞ, where std ðÞ is the standard deviation. (a) NEXRAD reflectivity snapshot at the
TRMM GV-site in Houston, TX (HSTN) on 11/13/1998 (00:02:00 UTC) at scale 1  1 km. (b) The his-
togram of the standardized Laplacian coefficients, with  ¼ 0.5 (Figure 3b) and (c) their corresponding
log histogram. Note that the zero coefficients over the nonrainy background have been excluded from
the histogram analysis. The solid line in Figure 4b is the least squares fitted exponential of the form
pðxÞ / exp ðjxjÞ, and the dash-dot line shows a standard normal distribution for comparison. The log
histogram in Figure 4c contrasts the heavy-tailed structure of the Laplacian coefficients versus the Gaus-
sian distribution, clearer than the original histogram in Figure 4b.
regularization. Throughout this paper, we use the Laplacian [28] To demonstrate the performance of the proposed
for L not only for its sparsifying effect on rainfall fields but regularized DS methodology, the NEXRAD HR observa-
also because of our empirical evidence about its stabilizing tion x was assumed as the true state, while the LR observa-
role and computational adaptability for rainfall downscal- tions y were obtained by smoothing x with an average filter
ing and data fusion problems. of size sc  sc, followed by a down-sampling operator with
[25] In practice, the histogram of the derivatives may ex- ratio sc. Given the true state and constructed LR observa-
hibit a thicker tail than the Laplace density, requiring a heav- tions, we can quantitatively examine the effectiveness of
ier tail probability model, such as the Generalized Gaussian the presented DS methodology by comparing the down-
Density (GGD) of the form pðxÞ / exp ðjxjp Þ, where scaled HR fields with the true HR field using some com-
p < 1 [see Ebtehaj and Foufoula-Georgiou, 2011b]. How- mon quality metrics.
ever, using such a prior model gives rise to a nonconvex [29] Both the Huber and Tikhonov regularization meth-
optimization problem in which convergence to the global ods were examined to downscale the observations from
minimum cannot be easily guaranteed. Therefore, the choice scales 4  4 and 8  8 km down to 1  1 km (Figure 5). A
of the ‘1-norm (the Laplace prior) for rainfall downscaling is very small amount of white noise v with standard deviation
indeed the closest convex relaxation that can partially fulfill of le-2 (5% of the standard deviation of the reference rain-
the strict statistical interpretation of the rainfall fields in de- fall field only over the wetted areas) was added to the LR
rivative domains. Following our observations related to the observations (equation (2)), giving rise to a diagonal error
distribution of the rainfall derivatives, here we direct our covariance matrix. In both of the regularization methods,
attention to the Huber penalty function as a smooth approxi- for downscaling from 4-to-1 and 8-to-1 km in grid spacing,
mation of the ‘1 regularization, and cast the rainfall DS as the regularization parameter  was set to 5e-3 and le-2,
the following constrained variational problem: respectively. These values were selected through trial and
  error ; however, there are some formal methods for auto-
1 matic estimation of this parameter, which are left for future
x^ ¼ argmin x jjy  Hxjj2R1 þ jjLxjjHub
2 ð8Þ work [e.g., Hansen, 2010, chap. 5]. In our experiments, it
s:t: xⱰ0:
turned out that small values of the Huber threshold , typi-
cally less than 10% of the field maximum range of variabil-
[26] Obviously, the constraint is due to the nonnegativity
ity, led to a successful recovery of isolated singularities and
of the rainfall fields. In this study, we adopted the gradient
local extreme rainfall cells (Figures 6 and 7).
projection (GP) method [Bertsekas, 1999, p. 228], to solve
[30] In the studied snapshot, coarse graining of the rain-
the above variational problem (see Appendix B).
fall reflectivity fields to the scales of 4  4 and 8  8 km
3.2.2. Rainfall Downscaling Results was equivalent to loosing almost 20% and 30% of the rain-
[27] The same rainfall snapshot shown in Figure 4 has fall energy in the reflectivity domain in terms of the relative
been used to examine the performance of the proposed root-mean-square error (RMSE), RMSE ¼ jjx  x^jj2 =jjxjj2
regularized DS methodology. Throughout the paper, to (see Table 1). Note that to compute the RMSE of the LR
make the reported parameters independent of the intensity observations, the size of those fields was extended to the
range, the rainfall reflectivity fields are first scaled into the size of the true field using the nearest neighborhood inter-
range between 0 and 1; however, the downscaling results polation, that is, each LR pixel was replaced with sc  sc
are presented in the true range. pixels with the same intensity value. In addition to the

5950
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 5. Sample results of the rainfall regularized downscaling (DS). (a) True HR rainfall reflectivity :
NEXRAD snapshot at the TRMM GV-site in Houston, TX (HSTN) on 11/13/1998 (00:02:00 UTC) at
resolution 1  1 km. (b and c) The synthetically generated, 4  4 and 8  8 km, coarse-scale and noisy
observations of the true rainfall reflectivity field. Left column: (d) Tikhonov and (f) Huber regularization
results for downscaling from 4 to 1 km ( ¼ 0.02). Right column: (e) Tikhonov and (g) Huber regular-
ized DS for downscaling from 8 to 1 km ( ¼ 0.04). Zooming views of the delineated box in Figure 5g
are shown in Figure 6.

5951
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 6. A zooming view for comparing qualitatively the Tikhonov (a and c) versus the Huber (b and
d) regularization for the downscaling (DS) example in Figure 5. The results indicate a marginally
improved performance by the Huber regularization, especially for smaller scaling ratio. The Huber regu-
larization yields sharper results and is more capable to recover high-intensity rainfall cells and the cor-
rect range of variability; see Table 1 for quantitative comparison using a suit of metrics and Figure 7.

relative RMSE measure, we also used three other metrics : [31] On average, it is seen that almost 25% of the lost
(1) relative mean absolute error (MAE), MAE ¼ jj relative energy of the rainfall reflectivity fields can be
x  x^jj1 =jjxjj1 ; (2) a logarithmic measure often called the restored via the regularized DS (Table 1). The ‘2-norm reg-
peak signal-to-noise ratio (PSNR), PSNR ¼ 20log 10 ularization led to smoother results, and as the scaling ratio
ðmax ðx^Þ=std ðx  x^ÞÞ, where std ðÞ denotes the standard grows, this regularization was almost incapable to recover
deviation ; and (3) the structural similarity index (SSIM) by the peaks and the correct variability range of the rainfall
Wang et al. [2004]. The PSNR (in dB) represents a measure reflectivity field (Figure 6). Typically, as expected, the
that not only contains RMSE information but also encodes Huber-norm regularization results are slightly better than
the recovered range. The latter metric varies between 1 the Tikhonov ones, although not always significantly. For
and 1 and the upper bound refers to the case where the esti- large scaling ratios (i.e., sc > 4), the results of those meth-
mated x^ and reference (true) field x are perfectly matched. ods tended to coincide in terms of the selected lump quality
The SSIM metric is popular in the image processing com- metrics such as the RMSE. However, using the Huber regu-
munity as it takes into account not only the marginal statis- larization, the recovered range was markedly better than
tics such as the RMSE but also the correlation structure that by the Tikhonov regularization, as reflected in the
between the estimated and reference field. This metric PSNR metric and recovered range. For example, in down-
seems very promising for analyzing the forecast mismatch scaling from 8-to-1 km  km via the Tikhonov regulariza-
with observations in hydrometeorological studies, espe- tion, the maximum recovered reflectivity values are
cially when the large-scale systematic errors (e.g., displace- approximately 41 dBZ, while using the Huber-norm regula-
ment error) might be more dominant than the random rization the maximum values are 45 dBZ (Figure 5).
errors ; see Ebtehaj et al. [2012] for applications of SSIM Employing the classic Z-R relationship for the NEXRAD
in rainfall downscaling. products (i.e., Z ¼ 300R1.4), one can easily check that the

5952
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

for larger scaling ratio, e.g., sc ¼ 8, indicating that in some


high-intensity areas the method still underestimates the true
field.

4. Regularized Data Fusion


4.1. Problem Formulation
[32] Analogous to the DS problem in the previous sec-
tion, here we focus on the formulation of the DF problem.
In the DF class of problems, typically an improved estimate
of the true state is sought from a series of LR and noisy
observations. Let x 2 Rm be the true state of interest while
a set of N downgraded measurements, yi 2 Rni ;
i ¼ 1; . . . ; N , are available through the following linear
observation model:
yi ¼ Hi x þ v; ð9Þ
ni m
where ni  m; Hi 2 R and vi  N ð0; Ri Þ denote an
T
uncorrelated Gaussian error in Rni ; Ei6¼j ½vi ðvj Þ ¼ 0. Com-
pared to the DS family of problems, a DF problem is more
constrained in the sense that usually there are more equa-
tions than the number of unknowns, Ni ni  m, giving rise
to an overdetermined linear system. As previously
explained, naturally the linear WLS estimate of the true
state, given the series of N observations, amounts to solving
the following optimization problem:
( 2 !)
1X N i
y  Hi x
x^ ¼ argmin : ð10Þ
x 2 i¼1 i 1
ðR Þ

[33] Note that the solution of the above problem not only
contains information about all of the available observations
(fusion) but also, with proper design of the observation opera-
tors, allows us to obtain an HR estimate of the state of interest
Figure 7. The quantiles of the standard normal density (downscaling). Clearly, the inverse of each covariance matrix
versus the standardized distribution of the recovered rain in equation (10) encodes the relative contribution or weight of
rates (mm/h), using Z ¼ 300R1.4 relationship, for the true each observation yi in the cost function. In other words, if the
HR field (red cross), the observed LR field (black plus), the elements of the covariance matrix of a particular observation
downscaled HR fields via the Tikhonov regularization vector are large compared to those of the other observation
(green circle), and the Huber-norm regularization (blue vectors, naturally the contribution of that observation to the
square), respectively. (a and b) The quantile-quantile plots obtained solution would be less significant.
for the HR fields obtained by downscaling from 4  4 to 1 [34] For notational convenience, the above system of
 1 km and 8  8 to 1  1 km, respectively. The rainfall equations can be augmented as follows:
quantile values are only for the positive rainy part of the 2 3 2 13 2 13
y1 H v
fields and are standardized by subtracting the mean and 6 7 6 7 6 7
4 ⯗ 5 ¼ 4 ⯗ 5x þ 4 ⯗ 5;
dividing by the standard deviation. The qq-plots signify ð11Þ
that the Huber regularization performs better than the yN HN vN
Tikhonov, especially over the tails, which represent the re- ) y ¼ Hx þ v
covery of high-intensity and extreme rainfall values from
where the concatenated error vector v has the following
the LR observations.
block diagonal covariance matrix,
rain rates associated with the above reflectivity values are 2 3
R1 0
approximately 15 and 28 (mm/h), respectively. Therefore,
T 6 .. 7
although the lump quality metrics are comparable for the R ¼ E vv ¼ 4 . 5: ð12Þ
two methods in the reflectivity domain, the main advantage 0 RN
of the Huber norm over the ‘2-norm is the recovery of local
extreme rain rates (Figure 7). It is clear from the quantile-
quantile plots in Figures 7a and 7b that for a small scaling [35] Therefore, the DF problem can be recast as the clas-
ratio, for example, sc ¼ 4, the Huber regularization can sic problem of estimating the true state from the augmented
very well reproduce both the tail and the body of the true observation model of y ¼ Hx þ v. Thus, setting the gradi-
rainfall distribution. However, the tail of the recovered ent of the cost function in equation (10) to zero yields the
rainfall distribution falls below the true rainfall distribution following linear system :

5953
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Table 1. Results Showing the Effectiveness of the Proposed Regularized DS in Reducing the Estimation Error and Increasing the Accu-
racy of the Estimated Rainfall Fieldsa
Observations Versus True Tikhonov-DS Versus True Huber-DS Versus True
b
Metric 4  4 km 8  8 km 4  4 km 8  8 km 4  4 km 8  8 km

RMSE 0.19 0.29 0.15 0.20 0.14 0.19


MAE 0.15 0.25 0.13 0.18 0.11 0.17
SSIM 0.71 0.56 0.78 0.66 0.80 0.66
PSNR 23.8 19.6 26.5 23.1 27.0 24.0
a
The first two columns refer to the values of the quality metrics obtained by comparing the constructed LR observations with true 1  1 km reflectivity
field. The other columns show the obtained metrics by comparing the downscaled fields with the true rainfall field. The performance of the Huber prior is
slightly better than the Tikhonov regularization, especially for the small scaling ratios (i.e., 4  4 km).
b
RMSE, relative root-mean-square error; MAE, relative maximum absolute error; SSIM, structural similarity; and PSNR, peak signal to noise ratio
(see section 3.2.2 for definitions).

 
H T R 1 H x^ ¼ H T R 1 y: ð13Þ 12  12 km. Here we only restrict our consideration to the
Huber norm regularization because of its consistency with
[36] This problem is overdetermined
  with a unique solu- the underlying rainfall statistics and its better performance in
tion; however, the Hessian H T R 1 H is likely to be very recovering of the rainfall heavy-tailed structure (Figure 7).
ill-conditioned. This ill-conditioning typically gives rise to To solve the DF problem, we have used the same settings
an unstable solution with large estimation error [e.g., Elad for the gradient projection (GP) method as explained in Ap-
and Feuer, 1997; Hansen, 2010]. Similar to the DS prob- pendix B.
lem, one possible remedy for stabilizing the solution is the [38] The solution of the ill-conditioned WLS formulation
regularization. Recalling the formulation discussed in the or the ML estimator in equation (10) is blocky, out of
previous section, a general regularized form of the rainfall range, and severely affected by the amplified inverted noise
DF problem can be written as (Figure 8c). On the other hand, the regularized DF can
properly restore a fine-scale and coherent estimate of the
 
1 rainfall field. The results show that more than 30% of the
x^ ¼ argmin x jjy  Hxjj2R 1 þ  L ðxÞ ; uncaptured subgrid energy of the examined rainfall reflec-
2 ð14Þ
s:t: xⱰ0 tivity field can be restored through solving the proposed
methodology (Table 2). As is evident, improvements of the
where the convex regularization function L ðxÞ can take selected fidelity measures in the DF problem are more pro-
different penalty norms, such as the Tikhonov jjLxjj22 , the nounced compared to the results of the DS experiment (see
‘1-norm jjLxjj1 , or the Huber norm jjLxjjHub . As is evident, Table 1). This naturally arises because more observations
similar to the DS problem, solution of equation (10) is are available in the DF problem than the DS one, and thus
equivalent to the frequentist ML estimator of the HR field the solution is better constrained. In terms of the selected
while equation (14) is the Bayesian MAP estimator. For lump metrics, analogous to the DS problem, we observed
further explanations and statistical interpretations please that the Huber-norm regularization is marginally better
see Appendix A. than the Tikhonov regularization, which is not reported
here. However, as expected, in terms of recovery of the
4.2. Application in Rainfall Data Fusion and Results heavy-tailed structure of the rainfall, it is verified that the
[37] To quantitatively evaluate the effectiveness of the Huber-norm regularization can capture the lost extreme
proposed regularized DF methodology for rainfall data, we values much better than the Tikhonov regularization (see
constructed two synthetic LR and noisy observations from Figure 9). It is clear from Figure 9 that the Huber-norm reg-
the original HR NEXRAD reflectivity snapshot. To resemble ularization very well captures the local extreme rainfall in-
different sensing protocols and specifications, we chose dif- tensity values while the Tikhonov regularization falls short
ferent smoothing and down-sampling operations to construct and can only partially recover those extreme intensities.
each of the synthetic observation fields. The first observation
field y1 was produced at resolution 6  6 km using a simple 5. Regularized Variational Data Assimilation
averaging filter of size 6  6, followed by a down-sampling
ratio of sc ¼ 6. Analogously, the second field y2 was gener- 5.1. Problem Formulation
ated at scale 12  12 km using a Gaussian smoothing kernel [39] Compared to the previously explained problems of
of size 12  12 with a standard deviation of 4 km, followed downscaling and data fusion, the data assimilation problem
by a down-sampling ratio of sc ¼ 12. To resemble the mea- is more involved in the sense that we also need to incorpo-
surement random error, white Gaussian errors with standard rate the evolution of a dynamical system in the estimation
deviations of le-2 and 2e-2 were also added respectively, process. Despite the increased complexity, DA shares the
which are equivalent to 5% and 10% of the standard devia- same principles with the explained formulations of the DS
tion of the reference rainfall field only over the wetted areas. and DF problems, from the estimation point of view. Here
Roughly speaking, this selection of the error magnitudes we briefly explain the classic linear three-dimensional var-
implies that the degree of confidence (relative weight) on the iational (3D-VAR) data assimilation scheme and extend its
observations at 6  6 km is twice that of the observations at formulation to a regularized format. Sample results of the

5954
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 8. Data fusion and downscaling of multisensor remotely sensed rainfall reflectivity fields using
the Huber regularization. (a and b) Reconstructed LR and noisy rainfall observations at scale 6 and 12
km in grid spacing. (c) The results of the WLS solution in equation (10) and (d) the solution of the regu-
larized DF using the Huber norm with  ¼ le-3 and  ¼ le-2.

regularized variational data assimilation problem are illus-


trated on the estimation of the initial conditions of the heat
equation in a 3D-VAR setting.
[40] The 3D-VAR is a memoryless assimilation method.
In other words, at each time step, the best estimate of the
true initial state or analysis state is obtained based only on
the present-time noisy observations and the background
state. The analysis is then used for forecasting the state at
the next time step and so on.
[41] Suppose that the true initial state of interest at dis-
crete time t0 is denoted by x0 2 Rm , the observation is
y0 2 Rn , and xb0 2 Rm represents the background state. In
the linear 3D-VAR data assimilation problem, obtaining
the analysis state xa0 2 Rm amounts to finding the minimum
point of the following cost function:
Table 2. Values of the Selected Fidelity Metrics in the Rainfall
DF Experiment Using the Huber Regularization, see Section 3.2.2
for the Definitionsa
Observations Versus True Huber-DF Versus True

Metric 6  6 km 12  12 km 1  1 km Figure 9. Quantiles of the standardized distribution of


the recovered rain rates (mm/h), using Z ¼ 300R1.4 relation-
RMSE 0.25 0.35 0.17 ship, versus standard normal quantiles. It is clear that the
MAE 0.21 0.32 0.15
SSIM 0.60 0.50 0.72
Huber-norm regularization results in a better recovery of
PSNR 21.3 18.1 25.0 the rainfall extremes than the Tikhonov regularization. Evi-
a
dently, because of extra information coming from multiple
Here the first two columns refer to comparison of the LR (6  6 and 12
 12 km) observations with the true rainfall field, and the last column
sensory data, the recovery of extreme rain rates is improved
presents the metrics obtained by comparing the DF results with the true in the DF experiment compared to the DS results; see
field. Figure 7.

5955
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

1 1 specifically, we use a top-hat initial condition which is


J 3D ðxk Þ ¼ jjxb0  x0 jj2B1 þ jjy0  Hx0 jj2R1 : ð15Þ
2 2 sparse in the derivative space and examine the results of
the regularized DA while it evolves in time under the heat
[42] In the cost function (15), B 2 Rmm and R 2 Rnn diffusion law. To this end, we construct an erroneous back-
are the background and observation error covariance matri- ground state and LR noisy observations of the top-hat ini-
ces and H is the observation operator. The analysis is then tial condition and then demonstrate the effectiveness of a
defined as the minimizer of equation (15), denoted as proper regularization on the quality of the obtained analysis
xa0 ¼ argmin x0 fJ 3D ðx0 Þg. Clearly, this 3D-VAR problem and forecast state. In the assimilation cycle, we obtain the
is a WLS problem, which has the following analytical analysis using the classic and regularized 3D-VAR assimi-
solution: lation methods and then examine those analysis states to
 1  1 b  obtain the forecast state at the next time step. The estimated
xa0 ¼ B1 þ HT R1 H B x0 þ HT R1 y0 : ð16Þ analysis and forecast states are then compared with their
available ground-truth counterparts.
[43] Because the error covariance matrices are positive [48] For a space-time representation of a 1-D scalar
definite, the matrix B1 þ HT R1 H is always positive defi- quantity xðs; tÞ, the well-known heat equation is
nite and hence invertible. Thus, solution of the 3D-VAR
requires no rank or dimension assumption on H. However, @xðs; tÞ
this problem might be very ill-conditioned depending on ¼ Er2 xðs; tÞ ð18Þ
@t
the architecture of the covariance matrices and the mea- xðs; 0Þ ¼ x0 ðsÞ;
surement operator. 
[44] Analogous to the previous discussions, the generic where 1 < s < 1; 0 < t < 1, and E L2 =TÞ denotes the
regularized form of the linear 3D-VAR under the predeter- diffusivity constant. In the rest of the paperfor brevity
 and
mined transformation L might be considered as follows: without loss of generality, we assume E ¼ 1 L2 =T .
[49] It is well understood that the general solution of the
xa0 ¼ argmin fJ 3D ðx0 Þ þ  L ðx0 Þg; ð17Þ heat equation at time t is given by the convolution of the
xk initial condition with the fundamental solution (kernel) as
follows:
where L ðx0 Þ can take any of the explained regularization
penalty functions, including the smooth Tikhonov jjLx0 jj22 , Z
the nonsmooth ‘1-norm jjLx0 jj1 , and the smooth Huber xðs; tÞ ¼ K ðs  r; tÞx0 ðrÞdr; ð19Þ
norm jjLx0 jjHub .
[45] In the above-regularized formulations, the analysis where
not only becomes close to the background and observa-
tions, in the weighted Euclidean sense, but it is also !
enforced to follow a regularity imposed by the L ðx0 Þ. 1=2 jsj2
K ðs; tÞ ¼ ð4EtÞ exp : ð20Þ
Here we emphasize that the regularized formulation in 4Et
equation (17) typically yields a more stable and improved
analysis than the classic formulation in equation (15). How- [50] We can see that xðs; tÞ is obtained via convolution
ever, this gain comes at the price of introducing a bias in of the initial condition
pffiffiffiffiffiffiffi with a Gaussian kernel of standard
the solution whose magnitude can be kept small by proper deviation  ¼ 2Et. Clearly, estimation of the initial con-
selection of the regularization parameter  [Hansen, 2010]. dition x0 ðsÞ only from the diffused observations xðs; tÞ is an
ill-posed deconvolution problem (see equation (1)).
5.2. Application in the Study of Land Surface Heat [51] To reconstruct a 3D-VAR assimilation experiment,
and Mass Fluxes we assume that the true top-hat initial condition in discrete
[46] The promise of the proposed regularized 3D-VAR space is a vector of 256 elements (x 2 Rm , where m ¼ 256)
data assimilation methodology is shown via assimilating as follows:
noisy and down-sampled observations into the dynamics of 
the heat equation. Diffusive transport of heat and moisture 2 112  xi  144
x0 ¼ ð21Þ
plays an important role in modeling of land surface water 1 otherwise :
and energy balance processes [e.g., Peter-Lidard et al.,
1997; Liang et al., 1999]. For example, in land surface [52] We added a white Gaussian noise with w ¼ 0.05
energy balance, the ground heat flux is typically modeled (15% of the standard deviation of the initial state) to the
by a 1-D heat diffusion equation for multiple layers of soil true initial condition for defining the background state xb0
columns for which data assimilation has been the subject of for the assimilation experiment.
special interest for improving hydrologic predictions [e.g., [53] We assume that the observation vector is a down-
Entekhabi et al., 1994; Margulis et al., 2002; Drusch and graded and noisy version of the true state, with the sensor
Viterbo, 2007; Bateni and Entekhabi, 2012]. only capturing the mean of every four neighbor elements of
[47] Here we do not dwell into a detailed parameteriza- the true state. In other words, the observation is a noisy and
tion of the heat equation for a real case study of land sur- LR version of the true state with one quarter of its size
face heat and water budget. Rather, we only focus on a (Figure 10). To this end, using the linear model in equation
simple well-controlled assimilation experiment to demon- (2), we employ the following architecture for the observa-
strate the promise of the regularized DA framework. More tion operator:

5956
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 10. (a) The true initial condition x0 and the results of the heat equation at t ¼ 5 and t ¼ 100 (T)
with E ¼ 1 (L2/T). (b) The reconstructed background state by adding a white noise with w ¼ 0.05 to the
true initial state and (c) the LR and noisy observations with v ¼ 0.03, respectively. (d) The results of the
classic 3D-VAR and the regularized version using the explained Tikhonov (T3D-VAR) and the Huber
(H3D-VAR) regularization methods (see equation (17)). (e and f) Magnified parts of the graphs in Figure
10d over the shown zooming windows.

2 3
1111 0000  0000 overfitting, while it slightly damps the noise. Indeed, the
166 0000 1111  0000 7
7 2 Rnm ; 3D-VAR is unable to effectively damp the high-frequency
H¼ 4 ð22Þ
4 ⯗ ⯗ ⯗ ⯗ 5 error components and recover the underlying true state.
0000 0000  1111 This overfitting may arise because the 3D-VAR cost func-
tion is a redundant WLS estimator and contains extra infor-
and impose a white Gaussian error with v ¼ 0.03, equiva- mation (both observations and background) than needed for
lent to 10% of the standard deviation of the true signal. a proper estimation of the true state. On the other hand, in
[54] The top-hat initial condition is selected to empha- the regularized assimilation methods, not only the error
size the role of regularization, especially regularization term but also a cost associated with the regularity of the
resulting from linear penalization (i.e., the Huber or ‘1- underlying state is also minimized. The Tikhonov regulari-
norm). Clearly, the first-order derivative of the above initial zation (T3D-VAR), i.e., L ðxÞ ¼ jjLxjj22 , led to a smoother
condition is very sparse. In other words, the first-order de- result compared to the classic one with slightly improved
rivative is zero everywhere on its domain except at the error statistics (Table 3). However, the result of the Huber
location of the two jumps, resembling a heavy tailed and regularization (H3D-VAR), i.e., L ðxÞ ¼ jjLxjjHub , is the
sparse statistical distribution. This underlying structure best. The rapidly varying noisy components are effectively
prompts us to use a regularization norm with linear penal- damped in this regularization, while the sharp jump discon-
ization and a first-order differencing operator for L in equa- tinuities have been preserved better than the T3D-VAR.
tion (17), as follows: The quantitative metrics in Table 3 indicate that in the anal-
2 3 ysis cycle, the RMSE and MAE metrics are improved dra-
1 1 0  0 0 matically, up to 85% in the H3D-VAR, compared to other
6 0 1 1  0 07 assimilation schemes.
6
L¼4 7 2 Rðm1Þm : ð23Þ
⯗ ⯗ ⯗ ⯗ ⯗ ⯗5 [56] As previously explained, there is no unique and uni-
0 0 0  1 1 versally accepted methodology for automated selection of
the regularization parameters, namely,  and . Here, to
[55] Figure 10 shows the inputs of the assimilation select the best parameters in the above assimilation exam-
experiment and the results of the analysis cycle, using the ples, we performed a few trial and error experiments. In
classic versus the regularized 3D-VAR estimators. In this other words, over a feasible range of parameter values, we
example, it is clear that the classic solution is subject to computed the analysis states and obtained the RMSE

5957
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Table 3. The Root-Mean-Square Error (RMSE) and the Mean common elements of the hydrometeorological problems of
Absolute Error (MAE) for the Studied Classic and Regularized DS, DF, and DA as discrete linear inverse problems. We
3D-VAR in the Analysis Cycle (A) and Forecast Step (F) argued about the importance of proper regularization,
which not only makes hydrometeorological inverse prob-
RMSE MAE
lems sufficiently well posed but also imposes the desired
Cycle 3D-VAR T3D-VAR H3D-VAR 3D-VAR T3D-VAR H3D-VAR regularity and statistical property on the solution. Regulari-
zation methods were theoretically linked to the underlying
A 0.0475 0.0397 0.0067 0.0376 0.0317 0.0043
F 0.0090 0.0088 0.0043 0.0071 0.0070 0.0033
statistical structure of the states and it was shown how in-
formation about the probability density of the state, or its
derivative, can be used for proper selection of the regulari-
metric by comparing them with the (known) true initial zation method. Specifically, we emphasized three types of
condition x0 (Figure 11). Note that the true initial condition regularization, namely, the Tikhonov, ‘1-norm, and Huber
is definitely not available in practice ; however, here we regularization methods. We argued that these methods are
used it to obtain the optimal values of the regularization pa- statistically equivalent to the maximum a posteriori (MAP)
rameters in the RMSE sense for comparison purposes and estimator while, respectively, assuming the Gaussian, Lap-
for demonstrating the importance of a proper regulariza- lace, and Gibbs prior density for the state of interest in a de-
tion. In the T3D-VAR, as expected, larger values of the rivative domain. It was argued that piecewise continuity of
regularization parameter (T) typically damp rapidly vary- the state and the presence of frequent jumps are often trans-
ing error components of the noisy background and observa- lated into heavy-tailed distributions in the derivative space
tions; however, they may give rise to an overly smooth
solution with larger bias and RMSE (Figures 10e and 10f).
Here, for the T3D-VAR experiment, we used the value
T ¼ 0.05 associated with the minimum RMSE (Figure
11a). In the H3D-VAR, in addition to the regularizer H,
we also need to choose the optimal threshold value  of the
Huber norm. A contour plot of the RMSE values for differ-
ent choices of H and  is shown in Figure 11b. By inspec-
tion, we roughly chose H ¼ 35 and  ¼ 1.5e-3 for the
H3D-VAR assimilation experiment presented in Figure 10.
[57] The main purpose of the DA process is, indeed, to
increase the quality of the forecast. Given the analysis state
at initial time, we can forecast the profile of the scalar
quantity, xðs; tÞ, at any future time step through the heat
equation. One important property of the heat equation is its
diffusivity. In other words, naturally noisy components and
rapidly varying perturbations in the initial analysis are
damped but become more correlated as the profile evolves
in time. Thus, rapidly varying uncorrelated error compo-
nents become low-varying and correlated features whose
detection and removal is naturally more difficult than in the
case of uncorrelated ones. Figure 12a shows the forecast
profile at t ¼ 10(T). The results indicate the importance of
proper regularization on the quality of the forecast in the
simple heat equation. The forecasts based on the classic
3D-VAR and the T3D-VAR almost coincide, while the
T3D-VAR is marginally better. This behavior arises
because neither of those methods could properly eliminate
the noisy features in the analysis cycle; hence, low-varying
error components appear in the forecast profile. However,
the quality metrics in Table 3 indicate that using H3D-
VAR, the RMSE and MAE of the forecast are improved by
more than 50% compared to the other methods.

6. Conclusions
[58] In this paper, we presented a new direction in Figure 11. (a) Root-mean-square error (RMSE) of the
approaching hydrometeorological estimation problems by implemented T3D-VAR as a function of the regularizer T.
taking into account important intrinsic properties of the (b) RMSE contour surface for the H3D-VAR experiment
underlying state of interest, such as the presence of sharp with different choices of the regularizer H and the thresh-
jumps, isolated singularities (i.e., local extremes), and sta- old value  of the Huber norm. Clearly, depending on the
tistical sparsity in the derivative space. We started by choice of the regularization method, the magnitude of the
explaining the concept of regularization and discussed the regularizer might be markedly different.

5958
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Figure 12. (a) True forecast state obtained by temporal evolution of the top-hat initial condition under
the heat equation at t ¼ 10 (T) (Figure 10a). (b and c) Magnified windows showing the forecast quality
using classic and regularized 3D-VAR assimilation methods. It can be seen that, due to ineffective error
removal by the classic 3D and T3D-VAR at the analysis cycle, large-scale correlated errors are propa-
gated in the forecast profiles, while this problem is less substantial in the result of the H3D-VAR (see
Table 3).

that favor the use of ‘1-norm or Huber-norm regularization methodologies for non-Gaussian and highly nonlinear
methods. dynamic systems.
[59] The effectiveness of the regularized DS and DF
problems was tested via analysis of remotely sensed precip- Appendix A: Statistical Interpretation
itation fields, and the superiority of the regularization with
linear penalization was clearly demonstrated. The perform- [61] In this appendix, we discuss the statistical interpre-
ance of the regularized DA was also studied via assimilat- tation of the presented downscaling, data fusion, and data
ing noisy observations into the evolution of the heat assimilation problems. We argue that the classic weighted
equation, which has fundamental applications in the study least squares formulations can be interpreted as the fre-
and data assimilation of land-surface heat and mass fluxes. quentist maximum likelihood (ML) estimators, while the
We showed that adding a Huber regularization term in the regularized formulations can be interpreted as the Bayesian
variational assimilation methods outperforms the classic maximum a posteriori (MAP) estimators. We also spell out
3D-VAR method, especially for the case where the initial the connection between the chosen regularization and the
condition exhibits a sparse distribution in the derivative prior distribution of the state (or its derivative), which can
space (e.g., first-order derivative of the top-hat initial guide proper selection of the regularization term in practi-
condition). cal applications.
[60] The presented frameworks can be potentially A1. Regularized Variational Downscaling and Data
applied to other hydrometeorological problems, such as Fusion
soil moisture downscaling, fusion, and data assimilation.
Clearly, proper selection of the regularization method [62] From the frequentist statistical point of view, it is
requires careful statistical analysis of the underlying state easy to show that the WLS solution of equation (3) is
of interest. Moreover, the problem of rainfall or soil mois- equivalent to the maximum likelihood estimator (ML)
ture retrieval from satellite microwave radiance can be con-
sidered as a nonlinear inverse problem. This nonlinear x^ML ¼ argmax pðyjxÞ; ðA1Þ
x
inversion may be cast in the presented context, provided
that the nonlinear kernel can be (locally) linearized with given that the conditional density, pðyjxÞ / exp
sufficient accuracy. Application of regularization in data  
T 1
assimilation is in its infancy (e.g., see Freitag et al. [2012]  =2ðy  HxÞ R ðy  HxÞ , is Gaussian. Specifically,
1

for a recent study) and is expected to play a significant role taking log ðÞ, one can find the minimizer of the negative
over the next decades, especially in the context of ensemble log-likelihood function log fpðyjxÞg as follows:

5959
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION
 
1 been argued that the 4D-VAR, and thus as a special case
x^ ML ¼ argmin ðy  HxÞT R1 ðy  HxÞ
x 2  the 3D-VAR cost function, can be interpreted via the
1 Bayesian MAP estimator [Johnson et al., 2005; Freitag et
¼ argmin jjy  Hxjj2R1 ; ðA2Þ
x 2 al., 2010; Nichols, 2010]. For notational convenience, here
we only explain the statistical interpretation of the
which is identical to the WLS solution of problem (3). 3D-VAR and its regularized version, which can be easily
[63] It is important to note that in the ML estimator, x is generalized for the case of the 4D-VAR problem.
considered to be a deterministic variable (fixed), while y [67] As discussed earlier, the ML estimator is basically a
has a random nature. On the other hand, in the Bayesian frequentist view to estimate the most likely value of an
perspective, a regularized solution of equations (4) or (6) is unknown deterministic variable x from (indirect) observa-
equivalent to the maximum a posteriori (MAP) estimator tions y of random nature. The ML estimator intuitively
requires finding the state that maximizes the likelihood
x^ MAP ¼ argmax pðxjyÞ; ðA3Þ function as
x

x^ML ¼ argmax pðyjxÞ: ðA6Þ


where both x and y are considered of random nature. Spe- x
cifically, using the Bayes theorem, ignoring the constant
terms in x and applying log ðÞ on the posterior density [68] Let us assume that, at the initial time step t0, the
pðxjyÞ, we get background xb0 is just a (random) realization of the true
deterministic initial state x0. In other words, we consider
  xb0 ¼ x0 þ w, where the error w can be well explained by a
pðyjxÞpðxÞ
x^ MAP ¼ argmin log zero mean Gaussian density N ð0; BÞ, uncorrelated with the
x pðyÞ
¼ argmin flog pðyjxÞ  log pðxÞg: ðA4Þ observation error, E½wvT ¼ 0. Here the background state
x is treated similarly to an observation that is of random na-
ture. Thus, let us recast the problem of obtaining the analy-
[64] The first term, log pðyjxÞ, is just the negative log sis as a classic linear inverse problem by augmenting the
likelihood as appeared in the ML estimator and the second available information in the form of
term is called the prior, which accounts for the a priori
knowledge about the density of the state vector x. Accord- y ¼ Hx0 þ v; ðA7Þ
ingly, the proposed Tikhonov regularization in equation (4)
h  iT
T  
is equivalent to the MAP estimator assuming that the state, T
where y ¼ xb0 ; yT0 , H ¼ I; HT , and v  N 0; R ,
or the linearly transformed state Lx, can be explained by a
with the following block diagonal covariance matrix
multivariate Gaussian of the following form:
 
log pðxÞ / xT Qx; ðA5Þ B 0
R¼ : ðA8Þ
0 R
where the covariance is Q ¼ LT L [e.g., Tikhonov et al.,
1977; Elad and Feuer, 1997; Levy, 2008]. Clearly, the
[69] Note that R is block diagonal because the back-
choice of the ‘1-norm in equation (6) implies that
ground and observation errors are uncorrelated. Following
log pðxÞ / jjLxjj1 or say the transformed state can be well
the augmented representation and applying log ðÞ, we
explained by a multivariate Laplace density with heavier    T  
tail than the Gaussian case [e.g., Tibshirani, 1996; Lewicki have log p yjx0 / 1=2 y  Hx0 R 1 y  Hx0 ;
and Sejnowski, 2000], while the Huber-norm regularization thus, it is easy to see that the ML estimator in terms of the
implies a Gibbs prior probability model log pðxÞ / augmented observations y,
X
 ðx Þ for the state of interest [Geman and Geman,
i T i  
1984; Schultz and Stevenson, 1994]. xa0 ¼ argmax p yjx0 ; ðA9Þ
[65] Obviously, based on the selected type of regulariza- x0

tion, statistical interpretation of the DF regularized class of


problems is also similar to what was explained for the DS is equivalent to minimizing the 3D-VAR cost function in
problem. In other words, given the augmented observation equation (15). Therefore, from the frequentist perspective,
model in equation (11), it is easy to see that the solution of which considers the state deterministic and the observations
equation (10) is the ML estimator, while equation (14) can random, the classic 3D-VAR solution is the ML estimator,
be interpreted as the MAP estimator with a prior density assuming Gaussian observation error.
depending on the form of the regularization term. [70] On the other hand, from the Bayesian perspective,
the state of interest and the available observations are con-
A2. Regularized Variational Data Assimilation sidered to be random and the MAP estimator is the optimal
[66] Statistical interpretation of the classic variational point, which maximizes the posterior density as:
DA problems is a bit tricky compared to the DS and DF
class of problems, mainly because of the involvement of x^ MAP ¼ argmax pðxjyÞ: ðA10Þ
the background information in the cost function. Lorenc x
[1986] derived the 3D-VAR cost function using Bayes the-
orem and called it the ML estimator [see, e.g., Lorenc, [71] Let us assume a priori that the (random) state of in-
1988; Bouttier and Courtier, 2002]. More recently, it has terest has a Gaussian density with mean xb and covariance

5960
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION
 
B, that is, pðx0 Þ  N xb0 ; B . More formally, this assump- iteratively as
tion implies that the deterministic background is the central
(mean) forecast and is related to the random true state via xkþ1 ¼ ½xk  k rJ ðxk Þ þ : ðB7Þ
x0 ¼ xb0 þ w, where w  N ð0; BÞ. Therefore, using Bayes
theorem it immediately follows that the 3D-VAR is the [76] Thus, if the descent at step k is feasible (i.e.,
MAP estimator, xa0 ¼ argmax x0 pðx0 jyÞ, assuming a Gaus- xk  k rJ ðxk ÞⱰ0), the GP iteration becomes an ordinary
sian prior for the true state of interest. unconstrained steepest descent method; otherwise, the
[72] In conclusion, if we follow the frequentist approach result is mapped back onto the feasible set by the projection
to interpret the classic 3D-VAR in equation (15), the regu- operator in equation (B6).
larized 3D-VAR in equation (17) can be interpreted as the [77] In our study, the stepsize (k) was selected using the
MAP estimator, where the prior density is characterized by Armijo rule, or the so-called backtracking line search, that
the regularization term. On the other hand, taking the MAP is, a convergent and very effective stepsize rule and
interpretation for the classic 3D-VAR, the regularized ver- depends on two constants : 0 < < 0:5; 0 < & < 1. In this
sion might be understood as the MAP estimator, which also method, the stepsize is assumed k ¼ & mk , where mk is the
accounts for an extra and independent prior on the distribu- smallest nonnegative integer for which
tion of the state under the L transformation.
J ðxk  k rJ ðxk ÞÞ  J ðxk Þ  k rJ ðxk ÞT rJ ðxk Þ: ðB8Þ
Appendix B: Gradient Projection Method for the
Huber Regularization [78] In our DS examples, the above backtracking param-
[73] Here we present the gradient projection (GP) eters are set to ¼ 0.2 and & ¼ 0.5 (see Boyd and Vanden-
method, using the Huber regularization, only for the down- berghe [2004, p. 464] for more explanation). In our coding,
jjxk xk1 jj2
scaling (DS) problem, which can be easily generalized to the iterations terminate either if jjx k1 jj2
 105 or the
the data fusion (DF) and data assimilation (DA) cases. In number of iterations exceeds 200.
the case of the DS problem, the cost function and gradient [79] For the above-explained gradient projection algo-
of the Huber regularization with respect to the elements of rithm and the employed parameters, the computational cost
the downscaled field are of the proposed framework is modest for a normal desktop
machine at the present time. In particular, on a Windows
1 operating system with an Intel(R)-i7 central processing unit
J ðxÞ ¼ jjy  Hxjj2R1 þ jjLxjjHub ðB1Þ (2.80 GHz clock rate), the process time of the presented
2
downscaling and data fusion experiments was approxi-
rJ ðxÞ ¼ HT R1 ðy  HxÞ þ LT T ðLxÞ;
0
ðB2Þ mately 120 s.

[80] Acknowledgments. This work has been supported by an Interdis-


where ciplinary Doctoral Fellowship (IDF) of the University of Minnesota Gradu-
 ate School and the NASA-GPM award NNX07AD33G. Partial support by
0 2x jxj   a NASA Earth and Space Science Fellowship (NESSF-NNX12AN45H) to
 T ðx Þ ¼ ðB3Þ the first author and the Ling chaired professorship to the second author are
2sign ðxÞ; jxj > : also greatly acknowledged. Thanks also go to Arthur Hou and Sara Zhang
at NASA-Goddard Space Flight Center for their support and insightful
[74] As is evident, the cost function in (B1) is a smooth discussions.
and convex function. Thus, its minimum can be easily
obtained using efficient first-order gradient descent methods References
in large dimensional problems. However, rainfall is a posi- Adler, R., et al. (2003), The version 2 global precipitation climatology pro-
tive process and in order to obtain a feasible downscaled ject GPCP monthly precipitation analysis (1979-present), J. Hydrome-
field x^, the regularized DS problem needs to be solved on teorol., 4(6), 1147–1167.
the nonnegative orthant fxjxi
0; 8i ¼ 1; . . . ; mg, AghaKouchak, A., A. Mehran, H. Norouzi, and A. Behrangi (2012), Sys-
tematic and random error components in satellite precipitation data sets,
Geophys. Res. Lett., 39, L09406, doi:10.1029/2012GL051592.
x^ ¼ argmin fJ ðxÞg Badas, M. G., R. Deidda, and E. Piga (2006), Modulation of homogeneous
ðB4Þ
s:t: xⱰ0: space-time rainfall cascades to account for orographic influences, Nat.
Hazard Earth. Syst. Sci., 6(3), 427–437, doi:10.5194/nhess-6-427-2006.
[75] We have used one of the primitive gradient projec- Bateni, S. M., and D. Entekhabi (2012), Surface heat flux estimation with
tion (GP) methods to solve the above constrained DS prob- the ensemble Kalman smoother: Joint estimation of state and parameters,
Water Resour. Res., 48, W08521, doi:10.1029/2011WR011542.
lem [see Bertsekas, 1999, p. 228]. Accordingly, to obtain
Bertsekas, D. P. (1999), Nonlinear Programming, 2nd ed., 794 pp., Athena
the solution of equation (B4) amounts to obtaining the fixed Scientific, Belmont, Mass.
point of the following equation: Bocchiola, D. (2007), Use of scale recursive estimation for assimilation of
x ¼ ½x  rJ ðx Þ þ ; ðB5Þ precipitation data from TRMM (PR and TMI) and NEXRAD, Adv. Water
Resour., 30(11), 2354–2372, doi:10.1016/j.advwatres.2007.05.012.
where  is a stepsize and Bouttier, F., and P. Courtier (2002), Data assimilation concepts and meth-
 ods, Meteorological Training Course Lecture Series, ECMWF, p. 59.
0 if x  0
½x þ ¼ ðB6Þ Boyd, S., and L. Vandenberghe (2004), Convex Optimization, 716 pp.,
x otherwise ; Cambridge Univ. Press, New York.
Candes, E. J., and T. Tao (2006), Near-optimal signal recovery from ran-
denotes the Euclidean projection operator onto the nonneg- dom projections: Universal encoding strategies?, IEEE Trans. Inf.
ative orthant. As is evident, the fixed point can be obtained Theory, 50(12), 5406–5425, doi:10.1109/TIT.2006.885507.

5961
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Castro, C. J., A. S. Pielke Roger, and G. Leoncini (2005), Dynamical down- Hossain, F., and E. N. Anagnostou (2005), Numerical investigation of the
scaling: Assessment of value retained and added using the Regional impact of uncertainties in satellite rainfall estimation and land surface
Atmospheric Modeling System RAMS, J. Geophys. Res., 110, D05108, model parameters on simulation of soil moisture, Adv. Water Resour.,
doi:10.1029/2004JD004721. 28(12), 1336–1350.
Chen, S. S., D. L. Donoho, and M. A. Saunders (1998), Atomic decomposi- Hossain, F., and E. N. Anagnostou (2006), A two-dimensional satellite rain-
tion by basis pursuit, SIAM J. Sci. Comput., 20, 33–61. fall error model, IEEE Trans. Geosci. Remote Sens., 44(6), 1511–1522.
Chen, S., D. Donoho, and M. Saunders (2001), Atomic decomposition by Huber, P. (1964), Robust estimation of a location parameter, Ann. Math.
basis pursuit, SIAM Rev., 43(1), 129–159. Stat., 35(1), 73–101.
Ciach, G. J., and W. F. Krajewski (1999), On the estimation of radar rainfall Huber, P. (1981), Robust Statistics, vol. 1, John Wiley, New York.
error variance, Adv. Water Resour., 22(6), 585–595. Huffman, G., R. Adler, B. Rudolf, U. Schneider, and P. Keehn (1995),
Cooley, J. W., and J. W. Tukey (1965), An algorithm for the machine calcu- Global precipitation estimates based on a technique for combining
lation of complex Fourier series, Math. Comput., 19(90), 297–301. satellite-based estimates, rain gauge analysis, and NWP model precipita-
Courtier, P., and O. Talagrand (1990), Variational assimilation of meteoro- tion information, J. Clim., 8(5), 1284–1295.
logical observations with the direct and adjoint shallow-water equations, Huffman, G., R. Adler, M. Morrissey, D. Bolvin, S. Curtis, R. Joyce, B.
Tellus, Ser. A, 42(5), 531–549. McGavock, and J. Susskind (2001), Global precipitation at one-degree
Courtier, P., J.-N. Thepaut, and A. Hollingsworth (1994), A strategy for opera- daily resolution from multisatellite observations, J. Hydrometeorol.,
tional implementation of 4D-VAR, using an incremental approach, Q. J. R. 2(1), 36–50.
Meteorol. Soc., 120(519), 1367–1387, doi:10.1002/qj.49712051912. Huffman, G., D. Bolvin, E. Nelkin, D. Wolff, R. Adler, G. Gu, Y. Hong, K.
Daley, R. (1993), Atmospheric Data Analysis, 472 pp., Cambridge Univ. Bowman, and E. Stocker (2007), The TRMM multisatellite precipitation
Press, Cambridge, U.K. analysis (TMPA): Quasi-global, multiyear, combined-sensor precipita-
Deidda, R. (2000), Rainfall downscaling in a space-time multifractal frame- tion estimates at fine scales, J. Hydrometeorol., 8(1), 38–55.
work, Water Resour. Res., 36(7), 1779–1794. Johnson, C., N. K. Nichols, and B. J. Hoskins (2005), Very large inverse
Drusch, M., and P. Viterbo (2007), Assimilation of screen-level variables in problems in atmosphere and ocean modelling, Int. J. Numer. Methods
ECMWF’s integrated forecast system: A study on the impact on the fore- Fluids, 47(8–9), 759–771, doi:10.1002/fld.869.
cast quality and analyzed soil moisture, Mon. Weather Rev., 135(2), Kalnay, E. (2003), Atmospheric Modeling, Data Assimilation, and Predict-
300–314, doi:10.1175/MWR3309.1. ability, 341 pp., Cambridge Univ. Press, New York.
Ebtehaj, A. M., and E. Foufoula-Georgiou (2011a), Adaptive fusion of mul- Kim, G., and A. Barros (2002), Downscaling of remotely sensed soil mois-
tisensor precipitation using Gaussian-scale mixtures in the wavelet do- ture with a modified fractal interpolation method using contraction map-
main, J. Geophys. Res., 116, D22110, doi:10.1029/2011JD016219. ping and ancillary data, Remote Sens. Environ., 83(3), 400–413.
Ebtehaj, A. M., and E. Foufoula-Georgiou (2011b), Statistics of precipita- Krajewski, W. F., B. Vignal, B. C. Seo, and G. Villarini (2011), Statistical
tion reflectivity images and cascade of Gaussian-scale mixtures in the model of the range-dependent error in radar-rainfall estimates due to the
wavelet domain: A formalism for reproducing extremes and coherent vertical profile of reflectivity, J. Hydrol., 402, 306–316, doi:10.1016/
multiscale structures, J. Geophys. Res., 116, D14110, doi:10.1029/ j.jhydrol.2011.03.024.
2010JD015177.
Kumar, P. (1999), A multiple scale state-space model for characterizing
Ebtehaj, A. M., E. Foufoula-Georgiou, and G. Lerman (2012), Sparse regu- subgrid scale variability of near-surface soil moisture, IEEE Trans. Geo-
larization for precipitation downscaling, J. Geophys. Res., 116, D22110, sci. Remote Sens., 37(1), 182–197, doi:10.1109/36.739153.
doi:10.1029/2011JD017057.
Kumar, P., and E. Foufoula-Georgiou (1993), A multicomponent decompo-
Elad, M. (2010), Sparse and Redundant Representations: From Theory to
sition of spatial rainfall fields. 2. Self-similarity in fluctuations, Water
Applications in Signal and Image Processing, 376 pp., Springer, New
Resour. Res., 29(8), 2533–2544.
York.
Kummerow, C. D., S. Ringerud, J. Crook, D. Randel, and W. Berg (2010),
Elad, M., and A. Feuer (1997), Restoration of a single superresolution
An observationally generated a priori database for microwave rainfall
image from several blurred, noisy, and undersampled measured images,
retrievals, J. Atmos. Oceanic Technol., 28(2), 113–130, doi:10.1175/
IEEE Trans. Image Process., 6(12), 1646–1658, doi:10.1109/83.650118.
2010JTECHA1468.1.
Entekhabi, D., H. Nakamura, and E. Njoku (1994), Solving the inverse
Ledoit, O., and M. Wolf (2004), A well-conditioned estimator for large-
problem for soil moisture and temperature profiles by sequential assimi-
dimensional covariance matrices, J. Multivariate Anal., 88(2), 365–411.
lation of multifrequency remotely sensed observations, IEEE Trans.
Geosci. Remote Sens., 32(2), 438–448, doi:10.1109/36.295058. Levy, B. C. (2008), Principles of Signal Detection and Parameter Estima-
Freitag, M. A., N. K. Nichols, and C. J. Budd (2010), L1-regularisation for tion, 1st ed., 639 pp., Springer, New York, doi:10.1007/978-0-387–
ill-posed problems in variational data assimilation, Proc. Appl. Math. 76544-0.
Mech., 10(1), 665–668, doi:10.1002/pamm.201010324. Lewicki, M., and T. Sejnowski (2000), Learning overcomplete representa-
Freitag, M. A., N. K. Nichols, and C. J. Budd (2012), Resolution of sharp tions, Neural Comput., 12(2), 337–365.
fronts in the presence of model error in variational data assimilation, Q. Liang, X., E. F. Wood, and D. P. Lettenmaier (1999), Modeling ground
J. R. Meteorol. Soc., 139, 742–757, doi:10.1002/qj.2002. heat flux in land surface parameterization schemes, J. Geophys. Res.,
Gaspari, G., and S. E. Cohn (1999), Construction of correlation functions in 104(D8), 9581–9600.
two and three dimensions, Q. J. R. Meteorol. Soc., 125(554), 723–757, Lorenc, A. (1988), Optimal nonlinear objective analysis, Q. J. R. Meteorol.
doi:10.1002/qj.49712555417. Soc., 114(479), 205–240.
Geman, S., and D. Geman (1984), Stochastic relaxation, Gibbs distribu- Lorenc, A. C. (1986), Analysis methods for numerical weather prediction, Q.
tions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. J. R. Meteorol. Soc., 112(474), 1177–1194, doi:10.1002/qj.49711247414.
Mach. Intell., 6(6), 721–741. Lovejoy, S., and B. Mandelbrot (1985), Fractal properties of rain, and a
Golub, G., P. Hansen, and D. O’Leary (1999), Tikhonov regularization and fractal model, Tellus, Ser. A, 37(3), 209–232.
total least squares, SIAM J. Matrix Anal. Appl., 21(1), 185–194. Lovejoy, S., and D. Schertzer (1990), Multifractals, universality classes and
Gorenburg, I. P., D. McLaughlin, and D. Entekhabi (2001), Scale-recursive satellite and radar, J. Geophys. Res., 95(D3), 2021–2034.
assimilation of precipitation data, Adv. Water Resour., 24(9–10), 941– Maggioni, V., R. H. Reichle, and E. N. Anagnostou (2012), The impact of
953, doi:10.1016/S0309-1708(01)00033-1. rainfall error characterization on the estimation of soil moisture fields in
Gupta, V., and E. Waymire (1993), A statistical analysis of mesoscale rain- a land data assimilation system, J. Hydrometeorol., 13(3), 1107–1118,
fall as a random cascade, J. Appl. Meteorol., 32, 251–251. doi:10.1175/JHM-D-11–0115.1.
Hansen, P. (1998), Rank-Deficient and Discrete Ill-Posed Problems: Margulis, S. A., and D. Entekhabi (2003), Variational assimilation of radiometric
Numerical Aspects of Linear Inversion, vol. 4, Soc. for Ind. Math., Phila- surface temperature and reference-level micrometeorology into a model of the
delphia, Pa. atmospheric boundary layer and land surface, Mon. Weather Rev., 131(7),
Hansen, P. (2010), Discrete Inverse Problems: Insight and Algorithms, vol. 1272–1288, doi:10.1175/1520-0493(2003)131<1272:VAORST>2.0.CO;2.
7, Soc. for Ind. and Appl. Math., Philadelphia, Pa. Margulis, S. A., D. McLaughlin, D. Entekhabi, and S. Dunne (2002), Land
Hong, Y., K. Hsu, S. Sorooshian, and X. Gao (2004), Precipitation estima- data assimilation and estimation of soil moisture using measurements
tion from remotely sensed imagery using an artificial neural network from the Southern Great Plains 1997 Field Experiment, Water Resour.
cloud classification system, J. Appl. Meteorol., 43(12), 1834–1853. Res., 38 (12), 1299, doi:10.1029/2001WR001114.

5962
EBTEHAJ AND FOUFOULA-GEORGIOU: REGULARIZED DOWNSCALING, DATA FUSION, AND ASSIMILATION

Masunaga, H., and C. Kummerow (2005), Combined radar and radiometer estimates of tropical rainfall, Bull. Am. Meteorol. Soc., 81(9), 2035–
analysis of precipitation profiles for a parametric retrieval algorithm, 2046.
J. Atmos. Oceanic Technol., 22(7), 909–929. Talagrand, O., and P. Courtier (1987), Variational assimilation of meteoro-
Merlin, O., A. Chehbouni, Y. Kerr, and D. Goodrich (2006), A downscaling logical observations with the adjoint vorticity equation. I: Theory, Q. J.
method for distributing surface soil moisture within a microwave pixel: Appli- R. Meteorol. Soc., 113(478), 1311–1328.
cation to the Monsoon’90 data, Remote Sens. Environ., 101(3), 379–389. Tibshirani, R. (1996), Regression shrinkage and selection via the lasso,
Nichols, N. K. (2010), Mathematical concepts of data assimilation, in Data J. R. Stat. Soc. Ser. B, 58(1), 267–288.
Assimilation: Making Sense of Observations, Part 1, pp. 13–39, Tikhonov, A., V. Arsenin, and F. John (1977), Solutions of Ill-Posed Prob-
Springer, Berlin. lems, Winston. Washington, D. C.
Parrish, D. F., and J. C. Derber (1992), The National Meteorological Cen- Tustison, B., E. Foufoula-Georgiou, and D. Harris (2003), Scale-recursive
ter’s spectral statistical-interpolation analysis system, Mon. Weather estimation for multisensor Quantitative Precipitation Forecast verifica-
Rev., 120(8), 1747–1763, doi:10.1175/1520-0493(1992)120<1747: tion: A preliminary assessment, J. Geophys. Res, 107(D8), 8377,
TNMCSS>2.0.CO;2. doi:10.1029/2001JD001073.
Perica, S., and E. Foufoula-Georgiou (1996), Model for multiscale disag- Van de Vyver, H., and E. Roulin (2009), Scale-recursive estimation for
gregation of spatial rainfall based on coupling meteorological and scal- merging precipitation data from radar and microwave cross-track
ing, J. Geophys. Res., 101(D21), 26,347–26,361. scanners, J. Geophys. Res., 114, D08104, doi :10.1029/
Peter-Lidard, C. D., M. S. Zion, and E. F. Wood (1997), A soil-vegetation- 2008JD010709.
atmosphere transfer scheme for modeling spatially variable water and
Veneziano, D., R. L. Bras, and J. D. Niemann (1996), Nonlinearity and
energy balance processes, J. Geophys. Res., 102(D4), 4303–4324.
self-similarity of rainfall in time and a stochastic model, J. Geophys.
Rebora, N., L. Ferraris, J. Von Hardenberg, and A. Provenzale (2005), Sto-
Res., 101(D21), 26,371–26,392.
chastic downscaling of LAM predictions: An example in the Mediterra-
nean area, Adv. Geosci., 2, 181–185. Wang, S., X. Liang, and Z. Nan (2011), How much improvement can pre-
Rebora, N., L. Ferraris, J. Von Hardenberg, and A. Provenzale (2006), cipitation data fusion achieve with a multiscale Kalman Smoother-based
Rainfall downscaling and flood forecasting: A case study in the Mediter- framework?, Water Resour. Res., 47, W00H12, doi:10.1029/
ranean area, Nat. Hazards Earth Syst. Sci., 6(4), 611–619. 2010WR009953.
Reichle, R., D. Entekhabi, and D. McLaughlin (2001a), Downscaling of ra- Wang, Z., A. Bovik, H. Sheikh, and E. Simoncelli (2004), Image quality
dio brightness measurements for soil moisture estimation: A four- assessment: From error visibility to structural similarity, IEEE Trans.
dimensional variational data assimilation approach, Water Resour. Res., Image Process., 13(4), 600–612.
37(9), 2353–2364. Wilby, R., T. Wigley, D. Conway, P. Jones, B. J. M. Hewitson, and
Reichle, R., D. McLaughlin, and D. Entekhabi (2001b), Variational data D. S. Wilks (1998a), Statistical downscaling of general circulation model
assimilation of microwave radio brightness observations for land surface output: A comparison of methods, Water Resour. Res., 34(11), 2995–
hydrology applications, IEEE Trans. Geosci. Remote Sens., 39(8), 1708– 3008, doi:10.1029/98WR02577.
1718, doi:10.1109/36.942549. Wilby, R., H. Hassan, and K. Hanaki (1998b), Statistical downscaling of
Sasaki, Y. (1958), An objective analysis based on variational method, J. hydrometeorological variables using general circulation model output,
Meteorol. Soc. Jpn., 36, 77–88. J. Hydrol., 205(1), 1–19.
Schultz, R., and R. Stevenson (1994), A Bayesian approach to image expan- Zupanski, D., S. Q. Zhang, M. Zupanski, A. Y. Hou, and S. H. Cheung
sion for improved definition, IEEE Trans. Image Process., 3(3), 233–242. (2010), A prototype WRF-based ensemble data assimilation system for
Siccardi, F., G. Boni, L. Ferraris, and R. Rudari (2005), A hydrometeoro- dynamically downscaling satellite precipitation observations, J. Hydro-
logical approach for probabilistic flood forecast, J. Geophys. Res., 110, meteorol., 12(1), 118–134, doi:10.1175/2010JHM1271.1.
D05101, doi:10.1029/2004JD005314. Zupanski, M. (1993), Regional four-dimensional variational data assimila-
Sorooshian, S., K. Hsu, G. Xiaogang, H. Gupta, B. Imam, and tion in a quasi-operational forecasting environment, Mon. Weather Rev.,
D. Braithwaite (2000), Evaluation of PERSIANN system satellite-based 121(8), 2396–2408.

5963

View publication stats

You might also like