Professional Documents
Culture Documents
A Unified Theory of Sparse Stochastic Processes
A Unified Theory of Sparse Stochastic Processes
(t) = (
w)(t) (1)
where
(t) = 1
+
(t)e
t
with Re() < 0 and 1
+
(t) is the unit-step function. Since
= (D Id)
1
where
D =
d
dt
and Id are the derivative and identity operators respectively, s
(t) s
(t)dt = dW(t),
where W(t) =
_
t
0
w()d is a standard Brownian motion (or Wiener process) excitation. In the statistical literature,
the solution of the above rst-order SDE is often called the Ornstein-Uhlenbeck process.
Let (s
[k] = s
(t)[
k=t
)
kZ
denote the sampled version of the continuous-time process. Then, one can show that
s
[] is a discrete AR(1) autoregressive process that can be whitened by applying the rst-order linear predictor:
s
[k] e
[k 1] = u[k] (3)
where u[] (prediction error) is an i.i.d. Gaussian sequence. Alternatively, one can decorrelate the signal by computing
its discrete cosine transform (DCT), which is known to be asymptotically equivalent to the Karhunen-Lo` eve
transform (KLT) of the process [29], [30]. Eq. (3) provides the basis for classical linear predictive coding (LPC),
while the decorrelation property of the DCT is often invoked to justify the popular JPEG transform-domain coding
scheme [31].
In this paper, we are concerned with the non-Gaussian counterpart of this story, which, as we shall see, will result
in the identication of sparse processes. The idea is to retain the simplicity of the classical innovation model, while
substituting the continuous-time Gaussian noise by some generalized L evy innovation (to be properly dened in the
October 8, 2012 DRAFT
5
1 1
(a) (b)
0 = L
0
1 = L
1
Haar
Fig. 2. Comparison of operator-like and conventional wavelet basis functions at two successive scales: (a) rst-order E-spline wavelets with
= 0.5. (b) Haar wavelets. The vertical axis is rescaled for full range display.
sequel). This translates into Eqs. (1)-(3) remaining valid, except that the underlying random variates are no longer
Gaussian. The more signicant nding is that the KLT (or its discrete approximation by the DCT) is no longer
optimal for producing the best M-term approximation of the signal. This is illustrated in Fig. 1, which compares
the performance of various transforms for the compression of two kinds of AR(1) processes with correlation
e
0.1
0.90: Gaussian vs. sparse where the latter innovation follows a Cauchy distribution. The key observation is
that the E-spline wavelet transform, which is matched to the operator L = DId, provides the best results in the
non-Gaussian scenario over the whole range of experimentation [cf. Fig. 1(b)], while the outcome in the Gaussian
case is as predicted by the classical theory with the KLT being superior. Examples of orthogonal E-spline wavelets
at two successive scales are shown in Fig. 2 next to their Haar counterparts. We selected the E-spline wavelets
because of their ability to decouple the process which follows from their operator-like behavior:
i
= L
i
where
i is the scale index and
i
a suitable smoothing kernel [32, Theorem 2]. Unlike their conventional cousins, they are
not dilated versions of each other, but rather extrapolations in the sense that the slope of the exponential segments
remains the same at all scales. They can, however, be computed efciently using a perfect reconstruction lterbank
with scale-dependent lters [32].
The equivalence with traditional wavelet analysis (Haar) and nite-differencing (as used in the computation of
total variation) for signal sparsication is achieved by letting 0. The catch, however, is that the underlying
system becomes unstable! Fortunately, the problem can be xed, but it calls for an advanced mathematical treatment
that is beyond the traditional formulation of stationary processes. The reminder of the paper is devoted to giving
a proper sense to what has just been described informally, and to extending the approach to the whole class of
ordinary differential operators, including the non-stable scenarios. The non-trivial outcome, as we shall see, is that
many non-stable systems are linked with non-stationary stochastic processes. These, in turn, can be stationarized
and sparsied by application of a suitable wavelet transformation. The companion paper [28] is focused on the
discrete aspects of the theory including the generalization of (3) for decoupling purposes and the full characterization
of the underlying processes.
October 8, 2012 DRAFT
6
III. MATHEMATICAL BACKGROUND
The purpose of this section is to introduce the distributional formalism that is required for the proper denition
of continuous-time white noise that is the driving term of (1) and its generalization. We start with a brief summary
of some required notions in functional analysis, which also serves us to set the notation. We then introduce the
fundamental concept of characteristic functional which constitutes the foundation of Gelfands theory of generalized
stochastic processes. We proceed by giving the complete characterization of the possible types of continuous-domain
white noisesnot necessarily Gaussianwhich will be used as universal input for our innovation models. We
conclude the section by showing that the non-Gaussian brands of noises that are allowed by Gelfands formulation
are intrinsically sparse, a property that has not been emphasized before (to the best of our knowledge).
A. Functional and distributional context
The L
p
-norm of a function f = f(t) is |f|
p
=
__
R
[f(t)[
p
dt
_1
p
for 1 p < and |f|
= ess sup
tR
[f(t)[
for p = + with the corresponding Lebesgue space being denoted by L
p
= L
p
(R). The concept is extendable
for characterizing the rate of decay of functions. To that end, we introduce the weighted L
p,
spaces with R
+
L
p,
= f L
p
: |f|
p,
< +
where the -weighted L
p
-norm of f is dened as
|f|
p,
= |(1 +[ [
)f()|
p
.
Hence, the statement f L
,
implies that f(t) decays at least as fast as 1/[t[
: |D
n
|
,m
< +, m, n Z
+
_
,
with the operator notation D
n
=
d
n
dt
n
and the convention that D
0
= Id (identity). o is a complete topological vector
space. Its topological dual is the space of tempered distributions o
t
; a distribution o
t
is a continuous linear
functional on o that is characterized by a duality product rule () = , =
_
R
(t)(t)dt with o where
the right-hand side expression has a literal interpretation as an integral only when (t) is true function of t. The
prototypical example of a tempered distribution is the Dirac distribution , which is dened as () = , = (0).
In the sequel, we will drop the explicit dependence of the distribution on the generic test function o and simply
write or even (t) (with an abuse of notation).
1
The topology of R is dened by the family of semi-norms ,m, m = 1, 2, 3, . . ..
October 8, 2012 DRAFT
7
Let T be a continuous
2
linear operator that maps o into itself (or eventually some enlarged topological space such
as L
p
). It is then possible to extend the action of T over o
t
(or an appropriate subset of it) based on the denition
T, = , T
if T
o continuously. An
important example is the Fourier transform whose classical denition is Tf() =
f() =
_
R
f(t)e
jt
dt. Since
T is a self-adjoint o-continuous operator, it is extendable to o
t
based on the adjoint relation T, = , T
for all o (generalized Fourier transform).
A linear, shift-invariant (LSI) operator that is well-dened over o can always be written as a convolution product:
T
LSI
(t) = (h )(t) =
_
R
h()(t )d
where h(t) = T
LSI
(t) is the impulse response of the system. The adjoint operator is the convolution with the
time-reversed version of h:
h
(t) h(t).
The better-known categories of LSI operators are the BIBO-stable (bounded input, bounded output) lters, and the
ordinary differential operators. While the latter are not BIBO-stable, they do work well with test functions.
1) L
p
-stable LSI operators: The BIBO-stable lters correspond to the case where h L
1
, or, more generally,
when h corresponds to a complex-valued Borel measure of bounded variation. The latter extension allows for
discrete lters of the form h
d
(t) =
nZ
d[n](t n) with d[n]
1
. We will refer to these lters as L
p
-stable
because they are bounded in all L
p
-norms (by Youngs inequality). L
p
-stable convolution operators satisfy the
properties of commutativity, associativity, and distributivity with respect to addition.
2) o-continuous LSI operators: For an L
p
-stable lter to yield a Schwartz function as output, it is necessary
that its impulse response (continuous or discrete) be rapidly-decaying. In fact, the condition h ! (which is
much stronger than integrability) ensures that the lter is o-continuous. The nth-order derivative D
n
and its adjoint
D
n
= (1)
n
D
n
are in the same category. The nth-order weak derivative of the tempered distribution is dened
as D
n
() = D
n
, = , D
n
for any o. The latter operatoror, by extension, any polynomial of
distributional derivatives P
N
(D) =
N
n=1
a
n
D
n
with constant coefcients a
n
Cmaps o
t
into itself. The class
of these differential operators enjoys the same properties as its classical counterpart: shift-invariance, commutativity,
associativity and distributivity.
B. Notion of generalized stochastic process
The leading idea in distribution theory is that a generalized function is not dened through its point values
(t), t R, but rather through its scalar products () = , with all test functions o. In an analogous
fashion, Gelfand and Vilenkin dene a generalized stochastic process s via the probability law of its scalar products
with arbitrary test functions o [17], rather than by considering the probability law of its pointwise samples
. . . , s(t
1
), s(t
2
), . . . , s(t
N
), . . . , as is customary in the conventional formulation.
2
An operator T is continuous from a (sequential) topological vector space V into another one iff.
k
in the topology of V implies
that T
k
T in the topology (or norm) of the second space. If the two spaces coincide, we say that T is V-continuous.
October 8, 2012 DRAFT
8
Let s be such a generalized process. We rst observe that the scalar product X
1
= s,
1
with a given test
function
1
is a conventional (scalar) random variable that is characterized by its probability density function (pdf)
p
X1
(x
1
); the latter is in one-to-one correspondence (via the Fourier transform) with the characteristic function
p
X1
(
1
) = Ee
j1X1
=
_
R
e
j1x1
p
X1
(x
1
)dx
1
= Ee
js,11)
where E is the expectation operator. The same
applies for the 2nd-order pdf p
X1,X2
(x
1
, x
2
) associated with a pair of test functions
1
and
2
which is the inverse
Fourier transform of the 2-D characteristic function p
X1,X2
(
1
,
2
) = Ee
js,11+22)
, and so forth if one
wants to specify higher-order dependencies.
The foundation for the theory of generalized stochastic processes is that one can deduce the complete statistical
information about the process from the knowledge of its characteristic form
P
s
() = Ee
js,)
(4)
which is a continuous, positive-denite functional over o such that
P
s
(0) = 1. Since the variable in
P
s
() is
completely generic, it provides the equivalent of an innite-dimensional generalization of the characteristic function.
Indeed, any nite dimensional version can be recovered by direct substitution of =
1
1
+ +
N
N
in
P
s
()
where the
n
are xed and where = (
1
, ,
N
) takes the role of the N-dimensional Fourier variable. In fact,
Gelfands theory rests upon the principle that specifying an admissible functional
P
s
() is equivalent to dening
the underlying generalized stochastic process (Bochner-Minlos theorem). The precise statement of this result, which
relies upon the fundamental notion of positive-deniteness, is given in Appendix I.
C. White noise processes
We dene a white noise w as a generalized random process that is stationary and whose measurements for non-
overlapping test functions are independent. A remarkable aspect of the theory of generalized stochastic processes
is that it is possible to deduce the complete class of such noises based on functional considerations only [17]. To
that end, Gelfand and Vilenkin consider the generic class of functionals of the form
P
w
() = exp
__
R
f
_
(t)
_
dt
_
(5)
where f is a continuous function on the real line and is a test function from some suitable space. This
functional species an independent noise process if
P
w
is continuous and positive-denite and
P
w
(
1
+
2
) =
P
w
(
1
)
P
w
(
2
) whenever
1
and
2
have non-overlapping support. The latter property is equivalent to having
f(0) = 0 in (5). Gelfand and Vilenkin then go on to prove that the complete class of functionals of the form (5)
with the required mathematical properties (positive-denitess and factorizability) is obtained by choosing f to be a
L evy exponent, as dened below.
Denition 1: A complex-valued continuous function f() is a valid L evy exponent if and only if f(0) = 0 and
g
() = e
f()
is a positive-denite function of for all R
+
.
The reader who is not familiar with the notion of positive deniteness is referred to Appendix I.
In doing so, they actually establish a one-to-one correspondence between the characteristic form of an independent
noise processes (5) and the family of innite-divisible laws whose characteristic function takes the form p
X
() =
October 8, 2012 DRAFT
9
e
f()
= Ee
jX
[33], [34]. While Denition 1 is hard to exploit directly, the good news is that there exists a
complete constructive, characterization of L evy exponents, which is a classical result in probability theory:
Theorem 1 (L evy-Khintchine formula): f() is a valid L evy exponent if and only if it can be written as
f() = jb
t
1
b
2
2
2
+
_
R\0]
[e
ja
1 ja1
]a]<1]
(a)] V (da) (6)
where b
t
1
R and b
2
R
+
are some constants and V is a L evy measure, that is, a (positive) Borel measure on
R0 such that
_
R\0]
min(1, a
2
) V (da) < . (7)
The notation 1
(a) refers to the indicator function that takes the value 1 if a and zero otherwise. Theorem 1
is fundamental to the classical theories of innite-divisible laws and L evy processes [21], [27], [34]. To further our
mathematical understanding of the L evy-Khintchine formula (6), we note that e
ja
1ja1
]a]<1]
(a)
1
2
a
2
2
as a 0. This ensures that the integral is convergent even when the L evy measure V is singular at the origin to the
extent allowed by the admissibility condition (7). If the L evy measure is nite or symmetrical (i.e., V (E) = V (E)
for any E R), it is then also possible to use the equivalent, simplied form of L evy exponent
f() = jb
1
b
2
2
2
+
_
R\0]
_
e
ja
1
_
V (da) (8)
with b
1
= b
t
1
_
0<]a]<1
aV (da). The bottomline is that a particular brand of independent noise process is thereby
completely characterized by its L evy exponent or, equivalently, its L evy triplet (b
1
, b
2
, v) where v is the so-called
L evy density associated with V such that
V (E) =
_
E
v(a)da
for any Borel set E R. With this latter convention, the three primary types of white noise encountered in the
signal processing literature are specied as follows:
1) Gaussian: b
1
= 0, b
2
= 1, v = 0
f
Gauss
() =
[[
2
2
,
P
w
() = e
1
2
||
2
L
2
. (9)
2) Compound Poisson: b
1
= 0, b
2
= 0, v(a) = p
A
(a) with
_
R
p
A
(a)da = p
A
(0) = 1,
f
Poisson
(; , p
A
) =
_
R
_
e
ja
1
_
p
A
(a)da,
P
w
() = exp
_
_
R
_
R
(e
ja(t)
1) p
A
(a)dadt
_
. (10)
3) Symmetric alpha-stable (SS): b
1
= 0, b
2
= 0, v(a) =
C
]a]
+1
with 0 < < 2 and C
=
sin(
2
)
a suitable
normalization constant,
f
() =
[[
!
,
October 8, 2012 DRAFT
10
P
w
() = e
1
!
||
L
. (11)
The latter follows from the fact that
]]
!
is the generalized Fourier transform of
C
]t]
+1
with the convention that
! = ( + 1) where is Eulers Gamma function [35].
While none of these noises has a classical interpretation as a random function of t, we can at least provide an
explicit description of the Poisson noise as a random sequence of Dirac impulses (cf. [22, Theorem 1])
w
(t) =
k
a
k
(t t
k
)
where the t
k
are random locations that are uniformly distributed over R with density , and where the weights a
k
are i.i.d. random variables with pdf p
A
(a).
D. Gaussian versus sparse categorization
To get a better understanding of the underlying class of white noises w, we propose to probe them through some
localized analysis window , which will yield a conventional i.i.d. random variable X = w, with some pdf
p
(x). The most convenient choice is to pick the rectangular analysis window (t) = rect(t) = 1
[
1
2
,
1
2
]
(t) when
w, rect is well-dened. By using the fact that e
jarect(t)
1 = e
ja
1 for t [
1
2
,
1
2
], and zero otherwise, we
nd that the characteristic function of X is simply given by
p
rect
() =
P
w
( rect(t)) = exp (f()) ,
which corresponds to the generic (L evy-Khinchine) form associated with an innitely-divisible distribution [27],
[34], [36]. The above result makes the mapping between generalized white noise processes and classical innite-
divisible (id) laws
3
explicit: The canonical id pdf of w, p
id
(x) = p
rect
(x), is obtained by observing the noise
through a rectangular window. Conversely, given the L evy exponent of an id distribution, f() = log (Tp
id
()),
we can specify a corresponding generalized white noise process w via the characteristic form
P
w
() by merely
substituting the frequency variable by the generic test function (t), adding an integration over R and taking the
exponential as in (5).
We note, in passing, that sparsity in signal processing may refer to two distinct notions. The rst is that of a
nite rate of innovation; i.e., a nite (but perhaps random) number of innovations per unit of time and/or space,
which results in a mass at zero in the histogram of observations. The second possibility is to have a large, even
innite, number of innovations, but with the property that a few large innovations dominate the overall behavior.
In this case the histogram of observations is distinguished by its heavy tails. (A combination of the two is also
possible, for instance in a compound Poisson process with a heavy-tailed amplitude distribution. For such a process
one may observe a change of behavior in passing from one dominant type of sparsity to the other.) Our framework
permits us to consider both types of sparsity, in the former case with compound Poisson models and in the latter
with heavy-tailed innitely-divisible innovations.
3
A random variable X with pdf p
X
(x) is said to be innitely divisible (id) if for any n N
+
there exist i.i.d. random variables X
1
, . . . , Xn
with pdf say pn(x) such that X = X
1
+ +Xn in law.
October 8, 2012 DRAFT
11
To make our point, we consider two distinct scenarios.
1) Finite variance case: We rst assume that the second moment m
2
=
_
R\0]
a
2
V (da) of the L evy density V
in (6) is nite. This allows us to rewrite the classical L evy-Khinchine representation as
f() = jc
1
b
2
2
2
+
_
R\0]
[e
ja
1 ja] V (da)
with c
1
= b
tt
1
+
_
]a]>1
aV (da) and where the Poisson part of the functional is now fully compensated. Indeed, we are
guaranteed that the above integral is convergent because [e
ja
1ja[ [a[
2
as a 0 and [e
ja
1ja[
[a[ as a . An interesting non-Poisson example of innitely-divisible probability laws that falls into this
category (with non-nite V ) is the Laplace distribution with L evy triplet (0, 0, v(a) =
e
|a|
]a]
) and p(x) =
1
2
e
]x]
.
This model is particularly relevant for sparse signal processing because it provides a tight connection between L evy
processes and total variation regularization [22, Section VI].
Now, if the L evy measure is nite
_
R
V (da) = < , the admissibility condition yields
_
R\0]
a V (da) < ,
which allows us to pull the bias correction out of the integral. The representation then simplies to (8). This implies
that we can decompose X into the sum of two independent Gaussian and compound Poisson random variables.
The variances of the Gaussian and Poisson components are
2
= b
2
and
_
R
a
2
V (da), respectively. The Poisson
component is sparse because its pdf exhibits a mass distribution e
P
s
() =
P
L
1
w
() =
P
w
(L
1
), (14)
where
P
w
is given by (5) (or one of the specic forms in the list at the end of Section III-C) and where we
are implicitly requiring that the adjoint L
1
is mathematically well-dened (continuous) over o, and that its
composition with
P
w
is well-dened for all o.
In order to realize the above idea mathematically, it is usually easier to proceed backwards: one species an
operator T that satises the left-inverse property: o, TL
.
Finally, one applies a standard limit argument to extend the action of T
= L
1
over the enlarged class of tempered
distribution o
t
based on the above adjoint relation, which yields the proper distributional denition of the right
inverse of L in (13).
B. Inverse operators
Before presenting our general method of solution, we need to identify a suitable set of elementary inverse
operators that satisfy the required boundedness conditions.
Our approach relies on the factorization of a differential operator into simple rst-order components of the form
(D
n
Id) with
n
C, which can then be treated separately. Three possible cases need to be considered.
1) Causal-stable: Re(
n
) < 0. This is the classical textbook hypothesis which leads to a causal-stable convolution
system. It is well known from linear system theory that the causal Green function of (D
n
Id) is the causal
October 8, 2012 DRAFT
13
exponential function
n
(t) already encountered in the introductory example in Section II. Clearly,
n
(t) is
absolutely integrable (and rapidly-decaying) iff. Re(
n
) < 0. It follows that (D
n
Id)
1
f =
n
f with
n
! L
1
. In particular, this implies that T = (D
n
Id)
1
species a continuous LSI operator on o. The
same holds for T
= (D
n
Id)
1
, which is dened as T
f =
n
f.
2) Anti-causal stable: Re(
n
) > 0. This case is usually excluded because the standard Green function
n
(t) =
1
+
(t)e
nt
grows exponentially, meaning that the system does not have a stable causal solution. Yet, it is possible
to consider an alternative anti-causal Green function
t
n
(t) =
n
(t) =
n
(t) e
nt
, which is unique in
the sense that it is the only Green function
4
of (D
n
Id) that is Lebesgue-integrable and, by the same token,
the proper inverse Fourier transform of
1
jn
for Re(
n
) > 0. In this way, we are able to specify an anti-causal
inverse lter (D
n
Id)
1
f =
t
n
f with
t
n
! that is L
p
-stable and o-continuous. In the sequel, we will
drop the
t
superscript with the convention that
(t) =
_
_
_
1
+
(t)e
t
if Re() 0
1
+
(t)e
t
otherwise.
(15)
which also covers the next scenario.
3) Marginally stable: Re(
n
) = 0 or, equivalently,
n
= j
0
with
0
R. This third case, which is incompatible
with the conventional formulation of stationary processes, is most interesting theoretically because it opens the door
to important extensions such as L evy processes, as we shall see in Section V. Here, we will show that marginally-
stable systems can be handled within our generalized framework as well, thanks to the introduction of appropriate
inverse operators.
The rst natural candidate for (Dj
0
Id)
1
is the inverse lter whose frequency response is
j0
() =
1
j(
0
)
+(
0
).
It is a convolution operator whose time-domain denition is
I
0
(t) = (
j0
)(t)
= e
j0t
_
t
e
j0
()d. (16)
Its impulse response
j0
(t) is causal and compatible with Denition (15), but not (rapidly) decaying. The adjoint
of I
0
is given by
I
0
(t) = (
j0
)(t)
= e
j0t
_
+
t
e
j0
()d. (17)
While I
0
(t) and I
0
(t) are both well-dened when L
1
, the problem is that these inverse lters are not
BIBO stable since their impulse responses,
j0
(t) and
j0
(t), are not in L
1
. In particular, one can easily see that
4
: is a Green functions of (D nId) iff. (D nId) = ; the complete set of solutions is given (t) = n
(t) +Ce
nt
which is
the sum of the causal Green function n
(t) plus an arbitrary exponential component that is in the null space of the operator.
October 8, 2012 DRAFT
14
I
0
(resp., I
0
) with o is generally not in L
p
with 1 p < +, unless (
0
) = 0 (resp., (
0
) = 0).
The conclusion is that I
0
fails to be a bounded operator over the class of test functions o.
This leads us to introduce some corrected version of the adjoint inverse operator I
0
,
I
0,t0
(t) = I
0
_
(
0
)e
j0t0
( t
0
)
_
(t)
= I
0
(t) (
0
)e
j0t0
j0
(t t
0
), (18)
where t
0
R is a xed location parameter and where (
0
) =
_
R
e
j0t
(t)dt is the complex sinusoidal moment
associated with the frequency
0
. The idea is to correct for the lack of decay of I
0
(t) as t by subtracting
a properly weighted version of the impulse response of the operator. An equivalent Fourier-based formulation is
provided by the formula at the bottom of Table I; the main difference with the corresponding expression for I
0
is the presence of a regularization term in the numerator that prevents the integrant from diverging at =
0
. The
next step is to identify the adjoint of I
0,t0
, which is achieved via the following inner-product manipulation
, I
0,t0
= , I
0
(
0
)e
j0t0
,
j0
( t
0
) (by linearity)
= I
0
, e
j0
, e
j0t0
I
0
(t
0
) (using (16))
= I
0
, e
j0(t0)
I
0
(t
0
), .
Since the above is equal to I
0,t0
, by denition, we obtain that
I
0,t0
(t) = I
0
(t) e
j0(tt0)
I
0
(t
0
). (19)
Interestingly, this operator imposes the boundary condition I
0,t0
(t
0
) = 0 via the substraction of a sinusoidal
component that is in the null space of the operator (Dj
0
Id), which gives a direct interpretation of the location
parameter t
0
. Observe that expressions (18) and (19) dene linear operators, albeit not shift-invariant ones, in
contrast with the classical inverse operators I
0
and I
0
.
For analysis purposes, it is convenient to relate the proposed inverse operators to the anti-derivatives corresponding
to the case
0
= 0. To that end, we introduce the modulation operator
M
0
(t) = e
j0t
(t)
which is a unitary map on L
2
with the property that M
1
0
= M
0
.
Proposition 1: The inverse operators dened by (16), (17), (19), and (18) satisfy the modulation relations
I
0
(t) = M
0
I
0
M
1
0
(t),
I
0
(t) = M
1
0
I
0
M
0
(t),
I
0,t0
(t) = M
0
I
0,t0
M
1
0
(t),
I
0,t0
(t) = M
1
0
I
0,t0
M
0
(t).
October 8, 2012 DRAFT
15
Proof: These follow from the modulation property of the Fourier transform (i.e, TM
0
() = T(
0
)) and the observations that I
0
(t) =
j0
(t) = M
0
0
(t) and I
0
(t) =
j0
(t) = M
0
0
(t) with
0
(t) =
1
+
(t) (the unit step function).
The important functional property of I
0,t0
is that it essentially preserves decay and integrability, while I
0,t0
fully
retains signal differentiability. Unfortunately, it is not possible to have the two simultaneously unless I
0
(t
0
) and
(
0
) are both zero.
Proposition 2: If f L
,
with > 1, then there exists a constant C
t0
such that
[I
0,t0
f(t)[ C
t0
|f|
,
1 +[t[
1
,
which implies that I
0,t0
f L
,1
.
Proof: Since modulation does not affect the decay properties of a function, we can invoke Proposition 1 and
concentrate on the investigation of the anti-derivative operator I
0,t0
. Without loss of generality, we can also pick
t
0
= 0 and transfer the bound to any other nite value of t
0
by adjusting the value of the constant C
t0
. Specically,
for t < 0, we write this inverse operator as
I
0,0
f(t) = I
0
f(t)
f(0)
=
_
+
t
f()d
_
f()d
=
_
t
f()d.
This implies that
[I
0,0
f(t)[ =
_
t
f()d
|f|
,
_
t
1
1 +[[
d
_
2
1
_
|f|
,
1 +[t[
1
for all t < 0. For t > 0, I
0,0
f(t) =
_
t
f()d so that the above upper bounds remain valid.
The interpretation of the above result is that the inverse operator I
0,t0
reduces inverse polynomial decay by one
order. Proposition 2 actually implies that the operator will preserve the rapid decay of the Schwartz functions which
are included in L
,
for any R
+
. It also guarantees that I
0,t0
belongs to L
p
for any Schwartz function
. However, I
0,t0
will spoil the global smoothness properties of because it introduces a discontinuity at t
0
,
unless (
0
) is zero in which case the output remains in the Schwartz class. This allows us to state the following
theorem which summarizes the higher-level part of those results for further reference.
Theorem 2: The operator I
0,t0
dened by (19) is a continuous linear map from ! into ! (the space of bounded
functions with rapid decay). Its adjoint I
0,t0
is given by (18) and has the property that I
0,t0
(t
0
) = 0. Together,
these operators satisfy the complementary left- and right-inverse relations
_
_
_
I
0,t0
(Dj
0
Id)
=
(Dj
0
Id)I
0,t0
=
for all o.
Having a tight control on the action of I
0,t0
over o allows us to extend the right-inverse operator I
0,t0
to an
appropriate subset of tempered distributions o
t
according to the rule I
0,t0
, = , I
0,t0
. Our complete
October 8, 2012 DRAFT
16
TABLE I
FIRST-ORDER DIFFERENTIAL OPERATORS AND THEIR INVERSES
L L
1
f(t) Properties of inverse operator
Standard case: n C, Re(n) = 0
(DnId) (DnId)
1
f(t) =
f()
1
j n
e
jt
d
2
Lp-stable, LSI, S-continuous
(DnId)
(D
nId)
1
f(t) =
f()
1
j n
e
jt
d
2
Lp-stable, LSI, S-continuous
Critical case: n = j
0
,
0
R
(Dj
0
Id) I
0
f(t) =
f()
1
j(
0
)
+(
0
)
e
jt
d
2
Causal, LSI
I
0
,t
0
f(t) =
f()
e
jt
e
j
0
(tt
0
)
e
jt
0
j(
0
)
d
2
Output vanishes at t = t
0
(Dj
0
Id)
0
f(t) =
f()
j( +
0
)
+
f(
0
)( +
0
)
e
jt
d
2
Anti-causal, LSI
I
0
,t
0
f(t) =
f()
f(
0
)e
j(+
0
)t
0
j( +
0
)
e
jt
d
2
Lp-stable and decay preserving
set of inverse operators is summarized in Table I together with their equivalent Fourier-based denitions which are
also interpretable in the generalized sense of distributions.
C. Solution of generic stochastic differential equation
We now have all the elements to solve the generic stochastic linear differential equation
N
n=1
a
n
D
n
s =
M
m=1
b
m
D
m
w (20)
where the a
n
and b
m
are arbitrary complex coefcients with the normalization constraint a
N
= 1. While this
reminds us of the textbook formula of an ordinary Nth-order differential system, the non-standard aspect in (20)
is that the driving term is a white noise process w, which is generally not dened pointwise, and that we are not
imposing any stability constraint. Eq. (20) thus covers the general case (12) where L is a shift-invariant operator
with the rational transfer function
L() =
(j)
N
+a
N1
(j)
N1
+ +a
1
(j) +a
0
b
M
(j)
M
+ +b
1
(j) +b
0
=
P
N
(j)
Q
M
(j)
. (21)
The poles of the system, which are the roots of the characteristic polynomial P
N
() =
N
+a
N1
n1
+ +a
0
with Laplace variable C, are denoted by
n
N
n=1
. While we are not imposing any restriction on their locus
in the complex plane, we are adopting a special ordering where the purely imaginary roots (if present) are coming
October 8, 2012 DRAFT
17
last. This allows us to factorize the numerator of (21) as
P
N
(j) =
N
n=1
(j
n
) =
_
Nn0
n=1
(j
n
)
_ _
n0
m=1
(j j
m
)
_
(22)
with
Nn0+m
= j
m
, 1 m n
0
, where n
0
is the number of purely-imaginary poles. The operator counterpart
of this last equation is the decomposition
P
N
(D) = (D
1
Id) (D
Nn0
Id)
. .
regular part
(Dj
1
Id) (Dj
n0
Id)
. .
critical part
which involves a cascade of elementary rst-order components. By applying the proper sequence of right-inverse
operators from Table I, we can then formally solve the system as in (13). The resulting inverse operator is
L
1
= I
n
0
,tn
0
I
1,t1
. .
shift-variant
T
LSI
(23)
with
T
LSI
= (D
Nn0
Id)
1
(D
1
Id)
1
Q
M
(D),
which imposes the n
0
boundary conditions
_
_
s(t)[
t=tn
0
= 0
(Dj
n0
Id)s(t)[
t=tn
0
1
= 0
.
.
.
(Dj
2
Id) (Dj
n0
Id)s(t)[
t=t1
= 0.
(24)
The corresponding adjoint operator is given by
L
1
= T
LSI
I
1,t1
I
n
0
,tn
0
. .
shift-variant
, (25)
and is guaranteed to be a continuous linear mapping from o into ! by Theorem 1, the key point being that each
of the component operators preserves the rapid decay of the test function to which it is applied. The last step is
to substitute the explicit form (25) of L
1
into (14) with a
P
w
that is well-dened on !, which yields the
characteristic form of the stochastic process s dened by (20) subject to the boundary conditions (24).
We close this section with a comment about commutativity: while the order of application of the operators
Q
M
(D) and (D
n
Id)
1
in the LSI part of (23) is immaterial (thanks to the commutativity of convolution), it
is not so for the inverse operators I
m,t0
that appear in the shift-variant part of the decomposition. The latter do
not commute and their order of application is tightly linked to the boundary conditions.
V. SPARSE STOCHASTIC PROCESSES
This section is devoted to the characterization and investigation of the properties of the broad family of stochastic
processes specied by the innovation model (12) where L is LSI. It covers the non-Gaussian stationary processes
(V-A), which are generated by conventional analog ltering of a sparse innovation, as well as the whole class
October 8, 2012 DRAFT
18
of processes that are solution of the (possibly unstable) differential equation (20) with a L evy noise excitation
(V-B). The latter category constitutes the higher-order generalization of the classical L evy processes, which are
non-stationary.
We have just addressed the fundamental issue of the solvability of the operator equation Ls = w. The only
missing ingredient is that one needs to ensure that the formal solution s = L
1
w is a bona de generalized
stochastic process. The answer, of course, is dependent upon whether or not we are able to exhibit an (adjoint)
inverse operator T = L
1
that is sufciently well-behaved for the resulting characteristic form
P
s
() =
P
w
(T)
to satisfy the sufcient conditions (continuity, positive-deniteness, and normalization) for existence, as stated in
the Minlos-Bochner theorem (Theorem 4). To that end, we shall rely on the following result whose proof is given
in Appendix II.
Theorem 3 (Admissibility): Let f is a valid L evy exponent and T is an operator acting on o such that any
one of the conditions below is met:
1) T is a continuous linear map from o into itself,
2) T is a continuous linear map from o into L
p
and the L evy exponent f is p-admissible in the sense that
[f(u)[ +[u[ [f
t
(u)[ C[u[
p
for all u R, where 1 p < and C is a positive constant.
Then,
P
s
() = exp
__
R
f
_
T(t)
_
dt
_
is a continuous, positive-denite functional on o such that
P
s
(0) = 1.
A. Non-Gaussian stationary processes
The simplest scenario is when L
1
is LSI and can be decomposed into a cascade of BIBO-stable and ordinary
differential operators. If the BIBO-stable part is rapidly-decreasing, then L
1
is guaranteed to be o-continuous.
In particular, this covers the case of an Nth-order differential system without any pole on the imaginary axis, as
justied by our analysis in Section IV-C.
Proposition 3 (Generalized stationary processes): Let L
1
(the right-inverse of some operator L) be a o-continuous
convolution operator characterized by its impulse response
L
= L
1
. Then, the generalized stochastic processes
that are dened by
P
s
() = exp
__
R
f
_
L
(t)
_
dt
_
where f(u) is of the generic form (6) are stationary and
well-dened solutions of the operator equation (12) driven by some corresponding innovation process w.
Proof: The fact that these generalized processes are well-dened is a direct consequence of the Minlos-
Bochner Theorem since L
1
(the convolution with
L
) satises the rst admissibility condition in Theorem 3. The
stationarity property is equivalent to
P
s
() =
P
s
(( t
0
)) for all t
0
R; it is established by simple change
of variable in the inner integral using the basic shift-invariance property of convolution; i.e., (
L
( t
0
)) (t) =
(
L
)(t t
0
).
The above characterization is not only remarkably concise, but also quite general. It extends the traditional theory
of stationary Gaussian processes, which corresponds to the choice f(u) =
2
0
2
u
2
. The Gaussian case results in the
simplied form
_
R
f(L
1
(t))dt =
2
0
2
|
L
|
2
L2
=
1
4
_
R
s
()[ ()[
2
d (using Parsevals identity) where
s
() =
2
0
]
L()]
2
is the spectral power density that is associated with the innovation model. The interest here is
October 8, 2012 DRAFT
19
that we get access to a much broader family of non-Gaussian processes (e.g., generalized Poisson or alpha-stable)
with matched spectral properties since they share the same whitening operator L.
The characteristic form condenses all the statistical information about the process. For instance, by setting =
( t
0
), we can explicitly determine
P
s
() = Ee
js,)
= Ee
js(t0)
= Tp
_
s(t
0
)
_
(), which yields the
characteristic function of the rst-order probability density, p(s(t
0
)) = p(s), of the sample values of the process.
In the present stationary scenario, we nd that p(s) = T
1
exp
__
R
f
_
L
(t)
_
dt
_
(s), which requires the
evaluation of an integral followed by an inverse Fourier transform. While this type of calculation is only tractable
analytically in special cases, it may be performed numerically with the help of the FFT. Higher-order density
functions are accessible as well as at the cost of some multi-dimensional inverse Fourier transforms. The same
applies for moments which can be obtained through a simpler differentiation process, as exemplied in Section
V-C.
B. Generalized L evy processes
The further reaching aspect of the present formulation is that it is also applicable to the characterization of
non-stationary processes such as Brownian motion and L evy processes, which are usually treated separately from
the stationary ones, and that it naturally leads to the identication of a whole variety of higher-order extensions.
The commonality is that these non-stationary processes can all be derived as solutions of an (unstable) Nth-order
differential equation with some poles on the imaginary axis. This corresponds to the setting in Section IV-C with
n
0
> 0.
Proposition 4 (Generalized Nth-order L evy processes): Let L
1
(the right-inverse of an Nth-order differential
operator L) be specied by (23) with at least one non-shift-invariant factor I
1,t1
. Then, the generalized stochastic
processes that are dened by
P
s
() = exp
__
R
f
_
L
1
(t)
_
dt
_
, where f(u) is of the generic form (6) subject to
the constraint [f(u)[ +[u[ [f
t
(u)[ C[u[
p
for some p 1, are well-dened solutions of the stochastic differential
equation (20) driven by some corresponding L evy white noise w. These processes satisfy the boundary conditions
(24) and are non-stationary.
Note that the p-admissibility condition on the L evy exponent f is satised by the great majority of the members of
the L evy-Kintchine family. For instance in the compound Poisson case, we can show that [u[[f
t
(u)[ [u[ E[A[
and f(u) [u[ E[A[ by using the fact [e
jx
1[ [x[; this implies that the bound in Theorem 3 with p = 1
is always satised provided that the rst (absolute) moment of the amplitude pdf p
A
(a) in (10) is nite. The only
cases we are aware of that do not fulll the condition are the alpha-stable noises with 0 < < 1, which are
notorious for their exotic behavior.
Proof: The result is a direct consequence of the analysis in Section IV-Cin particular, Eqs. (23)-(25)and
Proposition 2. The latter implies that L
1
is bounded in all L
,m
norms with m 1. Since o L
,m
L
p
and the Schwartz topology is the strongest in this chain, we can infer that L
1
is a continuous operator from o
onto any of the L
p
spaces with p 1. The existence claim then follows from the combination of Theorem 3 and
Minlos-Bochner. Since L
1
is not shift-invariant, there is no chance for these processes to be stationary, not to
October 8, 2012 DRAFT
20
mention the fact that they fulll the boundary conditions (24).
Conceptually, we like to view the generalized stochastic processes of Proposition 4 as adjusted versions of the
stationary ones that include some additional sinusoidal (or polynomial) trends. While the generation mechanism
of these trends is random, there is a deterministic aspect to it because it imposes the boundary conditions (24) at
t
1
, , t
n0
. The class of such processes is actually quite rich and the formalism surprisingly powerful. We shall
illustrate the use of Proposition 4 in Section V with the simplest possible operator L = D which will gets us back to
Brownian motion and the celebrated family of L evy processes. We shall also show how the well-known properties
of L evy processes can be readily deduced from their characteristic form.
C. Moments and correlation
The covariance form of a generalized (complex-valued) process s is dened as:
B
s
(
1
,
2
) = Es,
1
s,
2
.
where s,
2
= s,
2
when s is real-valued. Thanks to the moment generating properties of the Fourier transform,
this functional can be calculated from the characteristic form
P
s
() as
B
s
(
1
,
2
) = (j)
2
2
P
s
(
1
1
+
2
2
)
1=0,2=0
, (26)
where we are implicitly assuming that the required partial derivative of the characteristic functional exists. The
autocorrelation of the process is then obtained by making the formal substitution
1
= ( t
1
) and
2
= ( t
2
):
R
s
(t
1
, t
2
) = Es(t
1
)s(t
2
) = B
s
(( t
1
), ( t
2
)) .
Alternatively, we can also retrieve the autocorrelation function by invoking the kernel theorem: B
s
(
1
,
2
) =
_
R
2
R
s
(t
1
, t
2
)
1
(t
1
)(t
2
)dt
1
dt
2
.
The concept also generalizes for the calculation of the higher-order correlation form
5
Es,
1
s,
2
s,
N
= (j)
N
N
P
s
(
1
1
+ +
N
N
)
1
N
1=0, ,
N
=0
which provides the basis for the determination of higher-order moments and cumulants.
Here, we concentrate on the calculation of the second-order moments, which happen to be independent upon
the specic type of noise. For the cases where the covariance is dened and nite, it is not hard to show that the
generic covariance form of the white noise processes dened in Section III-C is
B
w
(
1
,
2
) =
2
0
1
,
2
,
where
2
0
is a suitable normalization constant that depends on the noise parameters (b
1
, b
2
, v) in (7)(10). We then
perform the usual adjoint manipulation to transfer the above formula to the ltered version s = L
1
w of such a
noise process.
5
For simplicity, we are only giving the formula for a real-valued process.
October 8, 2012 DRAFT
21
Property 1 (Generalized correlation): The covariance form of the generalized stochastic process whose charac-
teristic form is
P
s
() =
P
w
(L
1
) where
P
w
is a white noise functional is given by
B
s
(
1
,
2
) =
2
0
L
1
1
, L
1
2
=
2
0
L
1
L
1
1
,
2
,
and corresponds to the correlation function
R
s
(t
1
, t
2
) = Es(t
1
) s(t
2
) =
2
0
L
1
L
1
( t
1
), ( t
2
).
The latter characterization requires the determination of the impulse response of L
1
L
1
. In particular, when
L
1
is LSI with convolution kernel
L
L
1
, we get that
R
s
(t
1
, t
2
) =
2
0
L
1
L
1
(t
2
t
1
) = r
s
(t
2
t
1
) =
2
0
(
L
L
)(t
2
t
1
),
which conrms that the underlying process is wide-sense stationary. Since the autocorrelation function r
s
() is
integrable, we also have a one-to-one correspondence with the traditional notion of power spectrum:
s
() =
Tr
s
() =
2
0
]
L()]
2
, where
L() is the frequency response of the whitening operator L.
The determination of the correlation function for the non-stationary processes associated with the unstable versions
of (20) is more involved. We shall see in [28] that it can be bypassed if, instead of s(t), we consider the generalized
increment process s
d
(t) = L
d
s(t) where L
d
is a discrete version (nite-difference type operator) of the whitening
operator L.
D. Sparsication in a wavelet-like basis
The implicit assumption for the next properties is that we have a wavelet-like basis
i,k
iZ,kZ
available
that is matched to the operator L. Specically, the basis functions
i,k
(t) =
i
(t 2
i
k) with scale and location
indices (i, k) are translated versions of some normalized reference wavelet
i
= L
i
where
i
is an appropriate
scale-dependent smoothing kernel. It turns out that such operator-like wavelets can be constructed for the whole
class of ordinary differential operators considered in this paper [32]. They can be specied to be orthogonal and/or
compactly supported (cf. examples in Fig. 2). In the case of the classical Haar wavelet, we have that
Haar
= D
i
where the smoothing kernels
i
0
(t/2
i
) are rescaled versions of a triangle function (B-spline of degree 1). The
latter dilation property follows from the fact that the derivative operator D commutes with scaling.
We note that the determination of the wavelet coefcients v
i
[k] = s,
i,k
of the random signal s at a given
scale i is equivalent to correlating the signal with the wavelet
i
(continuous wavelet transform) and sampling
thereafter. The goods news is that this has a stationarizing and decoupling effect.
Property 2 (Wavelet-domain probability laws): Let v
i
(t) = s,
i
(t) with
i
= L
i
be the ith channel of the
continuous wavelet transform of a generalized (stationary or non-stationary) L evy process s with whitening operator
L and p-admissible L evy exponent f. Then, v
i
(t) is a generalized stationary process with characteristic functional
P
vi
() =
P
w
(
i
) where
P
w
is dened by (5). Moreover, the characteristic function of the (discrete) wavelet
October 8, 2012 DRAFT
22
coefcient v
i
[k] = v
i
(2
i
k)that is, the Fourier transform of the pdf p
vi
(v)is given by p
vi
() =
P
w
(
i
) =
e
fi()
and is innitely divisible with modied L evy exponent
f
i
() =
_
R
f
_
i
(t)
_
dt.
Proof: Recalling that s = L
1
w, we get
v
i
(t) = s,
i
( t) = L
1
w, L
i
( t)
= w, L
1
L
i
( t) =
_
i
w
_
(t)
where we have used the fact that L
1
is a valid (continuous) left-inverse of L
i
!has rapid decay (e.g., compactly-support or, at worst, exponential decay); this allows us to invoke Proposition
3 to prove the rst part.
As for the second part, we start from the denition of the characteristic function:
p
vi
() = Ee
jvi
= Ee
js,
i,k
)
= Ee
js,i)
( by stationarity)
=
P
s
(
i
) =
P
w
(L
1
L
i
)
=
P
w
(
i
) = exp
__
R
f
_
i
(t)
_
dt
_
where we have used the left-inverse property of L
1
and the expression of the L evy noise functional. The result
then follows by identication.
6
We determine the joint characteristic function of any two wavelet coefcients Y
1
= s,
i1,k1
and Y
2
= s,
i2,k2
with indices (i
1
, k
1
) and (i
2
, k
2
) using a similar technique.
Property 3 (Wavelet dependencies): The joint characteristic function of the wavelet coefcients Y
1
= v
i1
[k
1
] =
s,
i1,k1
and Y
2
= v
i2
[k
2
] = s,
i2,k2
of the generalized stochastic process s in Property 2 is given by
p
Y1,Y2
(
1
,
2
) = exp
__
R
f
_
i1
(t 2
i1
k
1
) +
2
i2
(t 2
i2
k
2
)
_
dt
_
where f is the L evy exponent of the innovation process w. The coefcients are independent if the kernels
i1
(t
2
i1
k
1
) and
i2
(t 2
i2
k
2
) have disjoint support; their correlation is given by
EY
1
Y
2
=
2
0
i1
( 2
i1
k
1
),
i2
( 2
i2
k
2
).
under the assumption that the variance
2
0
of w is nite.
Proof: The rst formula is obtained by substitution of =
1
i1,k1
+
2
i2,k2
in Ee
js,)
=
P
w
(L
1
),
and simplication using the left-inverse property of L
1
. The statement about independence follows from the
exponential nature of the characteristic function and the property that f(0) = 0, which allows for the factorization
6
A technical remark is in order here: the substitution of a non-smooth function such as
i
R in the characteristic noise functional
Pw is
legitimate provided that the domain of continuity of the functional can be extended from S to R. This is no problem when f is p-admissible
since we can readily adapt the proof of Theorem 3 to show that
Pw is a continuous, positive-dene functional over Lp(R), which is a much
larger space (and with a weaker topology) than both S and R.
October 8, 2012 DRAFT
23
of the characteristic function when the support of the kernels are distinct (independence of the noise at every point).
The correlation formula is obtained by direct application of the rst result in Property 1 with
1
=
i1,k1
=
L
i1
( 2
i1
k
1
) and
2
=
i2,k2
= L
i2
( 2
i2
k
2
).
These results provide a complete characterization of the statistical distribution of sparse stochastic processes in
some matched wavelet domain. They also indicate that the representation is intrinsically sparse since the transformed-
domain statistics are innitely divisible. Practically, this translates into the wavelet domain pdfs being heavier tailed
than a Gaussian (unless the process is Gaussian) (cf. argumentation in Section III-D).
To make matters more explicit, we consider the case where the innovation process is SS. The application of
Property 2 with f() =
]]
!
yields f
i
() =
]i]
!
with dispersion parameter
i
= |
i
|
L
. This proves that
the wavelet coefcients of a generalized SS stochastic process follow SS distributions with the spread of the pdf
at scale i being determined by the L
norm of the corresponding wavelet smoothing kernels. This implies that, for
< 2, the process is
compressible in the sense that the essential part of the energy content is carried by a
tiny fraction of wavelet coefcients [39].
It should be noted, however, that the quality of the decoupling is strongly dependent upon the spread of the
wavelet smoothing kernels
i
which should be chosen to be maximally localized for best performance. In the
case of the rst-order system (cf. example in Section II), the basis functions for i xed are not overlapping which
implies that the wavelet coefcients within a given scale are independent. This is not so across scale because of the
cone-shaped region where the support of the kernels
i1
and
i2
overlap, which induces dependencies. Incidentally,
the inter-scale correlation of wavelet coefcients is often exploited for improving coding performance [40] and
signal reconstruction by imposing joint sparsity constraints [41].
VI. L EVY PROCESSES REVISITED
We now illustrate our method by specifying classical L evy processesdenoted by W(t)via the solution of the
(marginally unstable) stochastic differential equation
d
dt
W(t) = w(t) (27)
where the driving term w is one of the independent noise processes dened earlier. It is important to keep in mind
that Eq. (27), which is the limit of (2) as 0, is only a notation whose correct interpretation is DW, = w,
for all o. We shall consider the solution W(t) for all t R, but we shall impose the boundary condition
W(t
0
) = 0 with t
0
= 0 to make our construction compatible with the classical one which is dened for t 0.
A. Distributional characterization of L evy processes
The direct application of the operator formalism developed in Section III yields the solution of (27):
W(t) = I
0,0
w(t)
where I
0,0
is the unique right inverse of D that imposes the required boundary condition at t = 0. The Fourier-based
expression of this anti-derivative operator is obtained from the 6th line of Table I by setting (
0
, t
0
) = (0, 0). By
October 8, 2012 DRAFT
24
using the properties of the Fourier transform, we obtain the simplied expression
I
0,0
(t) =
_
_
_
_
t
0
()d, t 0
_
0
t
()d, t < 0,
(28)
which allows us to interpret W(t) as the integrated version of w with the proper boundary conditions. Likewise,
we derive the time-domain expression of the adjoint operator
I
0,0
(t) =
_
_
_
_
t
()d, t 0,
_
t
()d, t < 0.
(29)
Next, we invoke Proposition 4 to obtain the characteristic form of the L evy process
P
W
() =
P
w
(I
0,0
) (30)
which is admissible provided that the L evy exponent f fullls the condition in Theorem 3.
We get the characteristic function of the sample values of the L evy process W(t
1
) = W, ( t
1
) by making
the substitution =
1
( t
1
) in (30):
P
W
_
1
( t
1
)
_
=
P
w
_
1
I
0,0
( t
1
)
_
with t
1
> 0. We then use (28)
to evaluate I
0,0
(t t
1
) = 1
[0,t1)
(t). Since the latter indicator function is equal to one for t [0, t
1
) and zero
elsewhere, it is easy to evaluate the integral over t in (5) with f(0) = 0, which yields
Ee
j1W(t1)
= exp
__
R
f
_
1
1
[0,t1)
(t)
_
dt
_
= e
t1f(1)
This result is equivalent to the celebrated L evy-Khinchine representation of the process [27].
B. L evy increments vs. wavelet coefcients
A fundamental property of L evy processes is that their increments at equally-spaced intervals are i.i.d. [27]. To
see how this ts into the present framework, we specify the increments on the integer grid as the special case of
(3) with = 0:
u[k] =
0
W(k) := W(k) W(k 1)
=
_
k
k1
w(t)dt = w,
0
( k)
where
0
(t) = 1
[0,1)
(t) =
0
0
(t) is the causal B-spline of degree 0 (rectangular function). We are also introducing
some new notation, which is consistent with the denitions given in [28, Table II], to set the stage for the
generalizations to come.
0
is the nite-difference operator, which is the discrete analog of the derivative operator
D, while
0
(unit step) is the Green function of the derivative operator D. The main point of the exercise is to show
that determining increments is structurally equivalent to the computation of the wavelet coefcients in Property 2
with the smoothing kernel
i
being substituted by
0
. It follows that the characteristic function of w
d
[] is given
by
p
u
() = exp
__
R
f(
0
(t)
_
dt
_
= e
f()
= p
id
() (31)
October 8, 2012 DRAFT
25
where the simplication of the integral results from the binary nature of
0
which is either 1 (on a support of size
1) or zero. This implies that the increments of the L evy process are independent (because the B-spline functions
0
( k) are non-overlapping) and that their pdf is given by the canonical id distribution of the innovation process
p
id
(x) (cf. discussion in Section III-D).
The alternative is to expand the L evy process in the Haar basis which is ideally matched to it. Indeed, the Haar
wavelet at scale i = 1 (lower-left function in Fig. 2) can be expressed as
Haar
(t/2) =
0
(t)
0
(t 1) =
0
0
= D
(0,0)
(t) (32)
where
(0,0)
=
0
0
is the causal B-spline of degree 1 (triangle function). Since D
0
W is i.i.d. in all cases, it does not yield a sparse representation in this rst instance because the underlying
distribution remains Gaussian. The second process, which may be termed L evy-Laplace motion, is specied by
the L evy density v(a) = e
]a]
/[a[ which is not in L
1
. By taking the inverse Fourier transform of (31), we can
show that its increment process has a Laplace distribution [22]; note that this type of generalized Gaussian model
is often used to justify sparsity-promoting signal processing techniques based on
1
minimization [42][44]. The
third piecewise-constant signal is a compound Poisson process. It is intrinsically sparse since a good proportion of
its increments is zero by construction (with probability e
-compressible in the strongest sense [39]. Specically, we can compress the sequence such as to preserve any
October 8, 2012 DRAFT
26
Fig. 3. Examples of L evy motions W(t) with increasing degrees of sparsity. (a) Brownian motion with L evy triplet (0, 1, 0). (b) L evy-
Laplace motion with
0, 0,
e
|a|
|a|
2
e
a
2
/2
with =
1
32
. (d) Symmetric L evy ight with
0, 0, 1/|a|
+1
and = 1.2.
prescribed portion r < 1 of its average
m=1
N
n=1
f(
m
n
)
m
n
0
for every possible choice of
1
, . . . ,
N
R,
1
, . . . ,
N
C and N Z
+
.
This is equivalent to the requirement that the N N matrix F whose elements are given by [F]
mn
= f(
m
n
)
is positive semi-denite (that is, non-negative denite) for all N, no matter how the
n
s are chosen.
Bochners theorem states that a bounded, continuous function f is positive-denite if and only if it is the Fourier
transform of a positive and nite Borel measure :
f() =
_
R
e
jx
(dx).
In particular, Bochners theorem implies that f is a valid characteristic functionthat is, f() = Ee
jX
=
_
R
e
jx
(dx) where X is a random variable with probability measure P
X
= iff. f is continuous, positive-
denite and f(0) = 1. Note that the above results and formulas also generalize to the multivariate setting.
These concepts carry over as well to functionals on some abstract nuclear space A, the prime example being
Schwartzs class o of smooth and rapidly-decreasing test functions [17].
Denition 3: A complex-valued functional L() dened over the function space A is said to be positive-denite
iff.
N
m=1
N
n=1
L(
m
n
)
m
n
0
for every possible choice of
1
, . . . ,
N
A,
1
, . . . ,
N
C and N N
+
.
8
A wavelet with N vanishing moments can always be rewritten as = D
N
with L
2
(R) where the operator L = D
N
is scale-invariant.
October 8, 2012 DRAFT
28
Theorem 4 (Minlos-Bochner): Given a functional
P
s
() on a nuclear space A that is continuous, positive-denite
and such that
P
s
(0) = 1, there exists a unique probability measure P
s
on the dual space A
t
such that
P
s
() = Ee
js,)
=
_
e
js,)
dP
s
(s),
where s, is the dual pairing map. One further has the guarantee that all nite dimensional probabilities measures
derived from
P
s
() by setting =
1
1
+ +
N
N
are mutually compatible.
The characteristic form therefore uniquely species the generalized stochastic process s = s() (via the innite-
dimensional probability measure P
s
) in essentially the same way as the characteristic function fully determines
the probability measure of a scalar or multivariate random variable.
APPENDIX II: PROOF OF THEOREM 3
1) As w is a generalized random process,
P
w
is a continuous functional on o. This, together with the
assumption that T is a continuous operator on o, implies that the composed functional
P
s
() :=
P
w
(T) is
continuous on o.
Given the functions
1
, . . . ,
N
in o and some complex coefcients
1
, . . . ,
N
,
1m,nN
P
s
(
m
n
)
m
n
=
1m,nN
P
w
_
T(
m
n
)
_
n
=
1m,nN
P
w
(T
m
T
n
)
m
n
(by the linearity of the operator T)
0. (by the positivity of :
w
)
This proves the positive-deniteness of the functional
P
s
on o.
Clearly,
P
s
(0) = :
w
(T0) =
P
w
(0) = 1.
2) By the continuity of the operator T from o into L
p
, T L
p
for all o. This together with the
assumption [f(u)[ C[u[
p
implies that
P
s
() = exp(
_
R
f(T(t))dt) is well-dened for all o. By the linear
property of the operator T and f(0) = 0, we obtain that
P
s
(0) = 1. The positive-deniteness of the functional
P
s
is established by an argument similar to the one used above. Finally we prove the continuity of the functional
P
s
on o: Let
n
n=1
be a convergent sequence in o and denote its limit in o by . Then by the assumption on
the linear operator T, T
n
converges to T in L
p
; that is,
lim
n
|T
n
T|
p
= 0. (33)
October 8, 2012 DRAFT
29
Next, we observe that
[f(u) f(v)[ =
_
u
v
f
t
(t)dt
_
u
v
t
p1
dt
_
R
f(T
n
(t))dt
_
R
f(T(t))dt
C
_
R
[T(t)[
p1
[T
n
(t) T(t)[ +[T
n
(t) T(t)[
p
dt
C
_
|T|
p1
p
|T
n
T|
p
+|T
n
T|
p
p
_
(by H olders inequality)
0 as n , (by (33))
which proves the continuity of the functional
P
s
on o.
ACKNOWLEDGEMENTS
The research was partially supported by the Swiss National Science Foundation under Grant 200020-109415, the
European Commission under Grant ERC-2010-AdG 267439-FUN-SP, and the National Science Foundation under
Grant DMS 1109063. The authors are thankful to Prof. Victor Panaretos (EPFL chair of Mathematical Statistics)
and Prof. Robert Dalang (EPFL Chair of Probabilities) for helpful discussions.
REFERENCES
[1] A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, 1991.
[2] R. Gray and L. Davisson, An Introduction to Statistical Signal Processing. Cambridge University Press, 2004.
[3] E. J. Cand` es and M. B. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 2130,
2008.
[4] A. M. Bruckstein, D. L. Donoho, and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images,
SIAM Review, vol. 51, no. 1, pp. 3481, 2009.
[5] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. San Diego: Academic Press, 2009.
[6] J.-L. Starck, F. Murtagh, and J. M. Fadili, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge
University Press, 2010.
[7] M. Elad, Sparse and Redundant Representations. From Theory to Applications in Signal and Image Processing. Springer, 2010.
[8] R. Baraniuk, E. Candes, M. Elad, and Y. Ma, Applications of sparse representation and compressive sensing, Proceedings of the IEEE,
vol. 98, no. 6, pp. 906 909, 2010.
[9] M. Elad, M. Figueiredo, and Y. Ma, On the role of sparse and redundant representations in image processing, Proceedings of the IEEE,
vol. 98, no. 6, pp. 972982, 2010.
[10] M. A. T. Figueiredo and R. D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing,
vol. 12, no. 8, pp. 906916, 2003.
October 8, 2012 DRAFT
30
[11] I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,
Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 14131457, 2004.
[12] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging
Sciences, vol. 2, no. 1, pp. 183202, 2009.
[13] T. Kailath, The innovations approach to detection and estimation theory, Proceedings of the IEEE, vol. 58, no. 5, pp. 680695, May
1970.
[14] G. Giannakis and J. Mendel, Cumulant-based order determination of non-Gaussian ARMA models, IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 38, no. 8, pp. 14111423, Aug. 1990.
[15] A. Swami, G. B. Giannakis, and J. M. Mendel, Linear modeling of multidimensional non-Gaussian processes using cumulants,
Multidimensional Systems and Signal Processing, vol. 1, pp. 1137, 1990.
[16] P. Rao, D. Johnson, and D. Becker, Generation and analysis of non-Gaussian Markov time series, IEEE Transactions on Signal Processing,
vol. 40, no. 4, pp. 845 856, Apr. 1992.
[17] I. Gelfand and N. Y. Vilenkin, Generalized Functions. Vol. 4. Applications of Harmonic Analysis. New York, USA: Academic press,
1964.
[18] I. Karatzas and S. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed., Springer, Ed., New York, 1991.
[19] B. Okensal, Stochastic Differential Equations, 6th ed. Springer, 2007.
[20] G. Samorodnitsky and M. S. Taqqu, Stable Non-Gaussian Random Processes: Stochastic Models with Innite Variance. Chapman &
Hall, 1994.
[21] D. Appelbaum, L evy Processes and Stochastic Calculus, 2nd ed. Cambridge University Press, 2009.
[22] M. Unser and P. Tafti, Stochastic models for sparse and piecewise-smooth signals, IEEE Transactions on Signal Processing, vol. 59,
no. 3, pp. 9891005, March 2011.
[23] M. Shao and C. Nikias, Signal processing with fractional lower order moments: stable processes and their applications, Proceedings of
the IEEE, vol. 81, no. 7, pp. 9861010, July 1993.
[24] P. Brockwell, L evy-driven CARMA processes, Annals of the Institute of Statistical Mathematics, vol. 53, pp. 113124, 2001.
[25] Q. Sun and M. Unser, Left inverses of fractional Laplacian and sparse stochastic processes, Advances in Computational Mathematics,
vol. 36, no. 3, pp. 399441, April 2012.
[26] P. L evy, Le Mouvement Brownien. Paris, France: Gauthier-Villars, 1954.
[27] K.-I. Sato, L evy Processes and Innitely Divisible Distributions. Chapman & Hall, 1994.
[28] M. Unser, P. Tafti, A. Amini, and H. Kirshner, A unied formulation of Gaussian vs. sparse stochastic processesPart II: Discrete-domain
theory, IEEE Transactions on Signal Processing, submitted.
[29] N. Ahmed, Discrete cosine transform, IEEE Transactions on Communications, vol. 23, no. 1, pp. 9093, sep 1974.
[30] M. Unser, On the approximation of the discrete Karhunen-Lo` eve transform for stationary processes, Signal Processing, vol. 7, no. 3,
pp. 231249, December 1984.
[31] N. Jayant and P. Noll, Digital coding of waveforms: principles and application to speech and video coding. Prentice-Hall, 1984.
[32] I. Khalidov and M. Unser, From differential equations to the construction of new wavelet-like bases, IEEE Transactions on Signal
Processing, vol. 54, no. 4, pp. 12561267, April 2006.
[33] W. Feller, An Introduction to Probability Theory and its Applications, Vol. 2, 2nd ed. New York: Wiley, 1971.
[34] F. W. Steutel and K. Van Harn, Innite Divisibility of Probability Distributions on the Real Line. Marcel Dekker, 2003.
[35] I. Gelfand and G. Shilov, Generalized Functions. Vol. 1. Properties and Operations. New York, USA: Academic press, 1964.
[36] A. Bose, A. Dasgupta, and H. Rubin, A contemporary review and bibliography of innitely divisible distributions and processes, Sankhya:
The Indian Journal of Statistics, Series A, vol. 64, no. 3, pp. pp. 763819, 2002.
[37] B. Ramachandran, On characteristic functions and moments, Sankhya: The Indian Journal of Statistics, Series A, vol. 31, no. 1, pp. pp.
112, 1969.
[38] S. J. Wolfe, On moments of innitely divisible distribution functions, The Annals of Mathematical Statistics, vol. 42, no. 6, pp. pp.
20362043, 1971.
[39] A. Amini, M. Unser, and F. Marvasti, Compressibility of deterministic and random innite sequences, IEEE Transactions on Signal
Processing, vol. 59, no. 11, pp. 51935201, November 2011.
October 8, 2012 DRAFT
31
[40] J. Shapiro, Embedded image coding using zerotrees of wavelet coefcients, IEEE Transactions on Acoustics, Speech and Signal
Processing, vol. 41, no. 12, pp. 34453462, 1993.
[41] M. Crouse, R. Nowak, and R. Baraniuk, Wavelet-based statistical signal processing using hidden markov models, IEEE Transactions on
Signal Processing, vol. 46, no. 4, pp. 886902, Apr 1998.
[42] C. Bouman and K. Sauer, A generalized Gaussian image model for edge-preserving MAP estimation, IEEE Transactions on Image
Processing, vol. 2, no. 3, pp. 296310, Jul. 1993.
[43] M. W. Seeger and H. Nickisch, Compressed sensing and Bayesian experimental design, in Proceedings of the 25th international conference
on Machine learning, ser. ICML08. New York, NY, USA: ACM, 2008, pp. 912919.
[44] S. Babacan, R. Molina, and A. Katsaggelos, Bayesian compressive sensing using Laplace priors, IEEE Transactions on Image Processing,
vol. 19, pp. 5364, January 2010.
[45] P. Protter, Stochastic Integration and Differential Equations. New York: Springer, 2004.
[46] J. Stewart, Positive denite functions and generalizations, an historical survey, Rocky Mountain Journal of Mathematics, vol. 6, no. 3,
pp. 409434, 1976.
October 8, 2012 DRAFT