Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/358044797

PRONY'S METHOD IN SIGNAL ANALYSIS

Preprint · January 2022

CITATIONS READS
0 447

1 author:

Charles L. Byrne
University of Massachusetts Lowell
247 PUBLICATIONS 6,787 CITATIONS

SEE PROFILE

All content following this page was uploaded by Charles L. Byrne on 23 January 2022.

The user has requested enhancement of the downloaded file.


PRONY’S METHOD IN SIGNAL ANALYSIS

CHARLES L. BYRNE

Abstract. In his 1795 paper Prony presented a method for analyzing a


sum of exponential functions. While this method works in theory it can
fail in the presence of noise or modeling errors. In this note I present
Prony’s method, relate it to modern-day stochastic spectral analysis,
and use this analogy to suggest improvements to Prony’s method.

1. Prony’s Problem
The problem Prony solved in [5] is to determine J and the complex num-
bers aj and γj from finitely many values of the function
J
X
(1.1) f (t) = aj etγj .
j=1

If we take the γj = iωj to be imaginary, f (t) becomes the sum of complex


sinusoids. If we take γj to be real, then f (t) is the sum of real exponentials,
either increasing or decreasing.
The date of publication of [5] is often taken by editors to be a typograph-
ical error and is replaced by 1995; or, since it is not written in English,
perhaps 1895. But the 1795 date is the correct one. The mathematical
problem Prony solved remains important in signal analysis, and his method
for solving it is still used today [6]. Prony’s method anticipates some of the
eigenvector methods described in [1, 3].

2. Prony’s Solution Method


Suppose that we have the data vector d with entries dn = f (n), for
n = 1, ..., N = 2M , and J ≤ M . We seek a nonzero vector c with entries
ck , k = 0, ..., M such that
M
X
(2.1) 0 = ck f (m + k) = c0 f (m) + c1 f (m + 1) + ... + cM f (m + M ),
k=0

Date: January 23, 2022.


1
2 C. BYRNE

for m = 1, ..., M . In matrix-vector notation, which was unavailable to Prony,


we are solving the linear system
    
d1 d2 ... dM +1 c0 0
 d2 d ... d c
3 M +2   1  0
    

 .   .  .
   =  ,
 .   .  .
    
 .   .  .
dM dM +1 ... dN cM 0

which we write as Dc = 0. Since D† Dc = 0 also, we see that c is an


eigenvector associated with the eigenvalue zero of the hermitian nonnegative
definite matrix D† D, with D† the conjugate transpose of the matrix D.
Fix a value of m and replace each of the f (m + k) in Equation (4.1) with
the value given by Equation (1.1) to get
J M
!
X X
(2.2) 0= aj ck ekγj emγj .
j=1 k=0

For each γ we define the vector e(γ) ∈ CM as


.
(2.3) e(γ) = (eγ , e2γ , ..., eM γ )T .
. P .
With C(z) = M k γj
k=0 ck z and bj = aj C(e ), Equation (4.2) becomes
J
X
(2.4) bj e(γj ) = 0.
j=1

We assume that no collection of the e(γ) with M members is linearly de-


pendent. Therefore, the bj are all zero, from which it follows that either
aj = 0 or C(eγj ) = 0. Therefore, the polynomial C(z) has roots at those
values z = eγj for which aj 6= 0. Once we find the roots of this polynomial
we have the values of eγj . Then we obtain the aj by solving a linear system
of equations. In practice we would not know J so would overestimate J
somewhat in selecting J = M . As a result, some of the aj would be zero.
.
Note that Dm = Rowm (D), the mth row of the matrix D, can be written
as
J
X
m
(2.5) D = (aj emγj )sj ,
j=1

where
.
(2.6) sj = (1, eγj , e2γj , ..., eM γj ).
Therefore the row space of D is spanned by the vectors sj , j = 1, ..., J, and
the rank of D is at most J. Therefore there is a nonzero vector c that is
orthogonal to each of the vectors sj for which aj is not zero.
PRONY 3

3. An Illustration
For the sake of illustration we consider the case of M = 3 and N = 6,
with J ≤ M . Suppose that our data are the real numbers dn , n = 1, ..., 6.
The matrix D is now  
d1 d2 d3 d4
D = d2 d3 d4 d5 
d3 d4 d5 d6
and the matrix D† D = DT D is
DT D =
d21 + d22 + d23
 
d1 d2 + d2 d3 + d3 d4 d1 d3 + d2 d4 + d3 d5 d1 d4 + d2 d5 + d3 d6
d2 d1 + d3 d2 + d4 d3
 d22 + d23 + d24 d2 d3 + d3 d4 + d4 d5 d2 d4 + d3 d5 + d4 d6 
.
d3 d1 + d4 d2 + d5 d3 d3 d2 + d4 d3 + d5 d4 d23 + d24 + d25 d3 d4 + d4 d5 + d5 d6 
d4 d1 + d5 d2 + d6 d3 d4 d2 + d5 d3 + d6 d4 d4 d3 + d5 d4 + d6 d5 d24 + d25 + d26
Note that, for each pair of indices u and v, we have (DT D)u,v a sum of
products for which u − v is constant. Note, however, that the sum is not
over all the pairs for which this is the case.

4. A Stochastic View of Prony’s Problem


Prony’s method works in theory but can fail when there is modeling error
or noise in the data. Because noise is usually modeled in terms of random
variables it is helpful to take a stochastic view of Prony’s problem.
We consider now a stochastic version of the function f (t) in Equation(1.1):
let
J
X
(4.1) f (t) = Aj etγj ,
j=1

where the Aj are complex random variables. We can then view the entries
dn of the random data vector d as instances of a random variable Xn given
by
J
. X
(4.2) Xn = Aj enγj .
j=1

We take as our data M independent samples of the random data vector d,


which we denote by dm , for m = 1, ..., M . Then we denote the nth entry of
the vector dm by
J
X
(4.3) dm
n = am
j e
nγj
,
j=1

where, for each j, the am


j , m = 1, ..., M , are M independent samples of the
random variable Aj . We let S be the matrix whose mth row is the vector
4 C. BYRNE

dm . Then S m , the mth row of the matrix S, has the form


J
X
(4.4) Sm = (am j
j )s .
j=1

When we compare the mth row of the matrix S with the mth row of the
matrix D, as given by Equation (2.5), we see that the coefficients am j in
Equation (4.4) and (aj emγj ) in Equation (2.5) play similar roles. This will
help us when we modify Prony’s method to deal with noise.
We denote by Rs the correlation matrix for d in the noise-free case; that
is,
(4.5) (Rs )k,n = E(dk dn ).
1 †
Then Rs is approximated by M S S.

5. Allowing for Additive Noise


Suppose now that the random variable dn has the form
J
X
(5.1) dn = Aj enγj + zn ,
j=1

where the zn , n = 1, ..., N , are random variables that are independent with
respect to one another and with respect to the random variables Aj , with
means equal to zero and variances equal to σ 2 . This is often described
as data containing additive white noise. Then the expected value of dk dn
becomes
(5.2) (Rd )k,n = E(dk dn ) = (Rs )k,n + σ 2 δ(n − k),
where δ(n − k) = 0, if n 6= k, and equal to one, when n = k. We then have
Rd = Rs + σ 2 I, where I is the identity matrix.
1 †
When the data contains additive white noise the matrix M S S is a statis-
tical estimate of the matrix Rd . Consequently, the contribution of the noise
is primarily to increase the main diagonal of S † S. To the extent that the
matrix D† D is analogous to the matrix S † S we should expect the effect of
additive white noise to be primarily an increase in the values on the main
diagonal of D† D. This suggests that, instead of looking for a nonzero vector
c such that Dc = 0, we should select c to be an eigenvector of D† D corre-
sponding to the smallest eigenvalue of that matrix. It will then follow that
such a vector c should be (nearly) orthogonal to each of the vectors sj for
which aj 6= 0.

6. More General Signal Vectors


In this section we describe a model that extends that of Prony to allow
.
for more general signal vectors. Let S = {e(θ), θ ∈ Θ} ⊆ CN be the col-
lection of all potential signal column vectors, where Θ is some metric space
of parameters, each e(θ) is a unit vector, and no subset of S with N or
PRONY 5

fewer members is linearly dependent. Each measurement vector is a single


realization of the random column vector d in CN , given by
J
X
(6.1) d= Aj e(θj ) + z,
j=1

where the Aj , j = 1, ..., J, are uncorrelated complex random variables with


mean zero, J < N , and the noise z is a complex random vector with possibly
correlated entries. Then the correlation matrix for our measurements is
.
R = Rd given by
J
X
(6.2) R = E(dd† ) = E(|Aj |2 )e(θj )e(θj )† + E(zz† ) = Rs + Q.
j=1

6.1. A Linear Estimator. The expected value of the magnitude squared


of the matched filter e(θ)† d is
.
(6.3) A(θ) = E(|e(θ)† d|2 ) = e(θ)† Re(θ).

We think of A(θ) as a “linear” estimator because it does not involve the


inverse of R, nor explicit calculation of the eigenvectors and eigenvalues of
R. In practice we estimate R by averaging several realizations of the random
matrix dd† .

6.2. Prewhiten, then Match. As I discussed in [2], the best linear unbi-
ased estimator (BLUE) for estimating γ, when d is the random data vector
d = γs + z, Q = E(zz† ), and x is the vector of measurements, is
  
(6.4) γ̂ = 1/s† Q−1 s s† Q−1 x .

With Q = CC † we can write


  †
(6.5) γ̂ = 1/s† Q−1 s C −1 s C −1 x.

Multiplying by C −1 is called “prewhitening” , so the optimal estimator in-


volves prewhitening, followed by a matched filter.

6.3. Modifying A(θ) Using Prewhitening. With s = e(θ) the prewhitened


. .
signal becomes t(θ) = C −1 e(θ), the prewhitened data vector becomes g =
.
C −1 d, and the prewhitened correlation matrix becomes P = C −1 R(C −1 )† =
E(|gg† |2 ). In place of the estimator A(θ) we have
.
(6.6) B(θ) = E(|t(θ)† g|2 ) = e(θ)† Q−1 RQ−1 e(θ).
6 C. BYRNE

6.4. Capon’s Estimator. When N is not large and some of the θj are
close to one another the functions A(θ) and B(θ) may not be able to resolve
these closely spaced components of R [1]. To improve resolution we can turn
to high-resolution methods.
As we saw in [2], the BLUE is based on finding the vector b that minimizes
b† Qb, subject to b† e(θ) = 1. The problem with this approach is that we
cannot determine Q from measurements and only know Q from theoretical
models. Capon [4] suggests finding, for fixed θ, the vector h(θ) = h that
minimizes h† Rh, subject to h† e(θ) = 1. The vector h(θ) is then
1
(6.7) h(θ) = R−1 e(θ).
e(θ) R−1 e(θ)

The idea here is that the filter h(θ) suppresses every component of the data
that is not a multiple of e(θ). This includes, but is not limited to, the
background noise. Capon’s estimator is then the function of θ defined by
E(|h(θ)† d|2 ) and given by
. 1
(6.8) C(θ) = .
e(θ) R−1 e(θ)

When the fixed θ is not one of the signal parameters C(θ) is relatively
small. This leads to improved resolution, since when θ lies between two
actual signal parameters the value of C(θ) will typically be smaller than its
value at either of the two actual signal parameters. However, as I discussed
in [1], this improved resolution can be lost when the data is perturbed by
phase errors.

References
[1] Byrne, C. (2021) “Noise in high-resolution signal processing.” , posted on Research-
Gate July 20, 2021.
[2] Byrne, C. (2021) “Modified inner products in signal detection.” , posted on Research-
Gate August 6, 2021.
[3] Byrne, C. (2021) “Avoiding prewhitening through dimensionality reduction in array
processing.” , posted on ResearchGate August 9, 2021.
[4] Capon, J. (1969) “High-resolution frequency-wavenumber spectrum analysis.”Proc.
of the IEEE 57, pp. 1408–1418.
[5] Prony, G.R.B. (1795) “Essai expérimental et analytique sur les lois de la dilatabilité
de fluides élastiques et sur celles de la force expansion de la vapeur de l’alcool, à
différentes températures.”Journal de l’Ecole Polytechnique (Paris) 1(2), pp. 24–76.
[6] Therrien, C. (1992) Discrete Random Signals and Statistical Signal Processing. En-
glewood Cliffs, NJ: Prentice-Hall.

(C. Byrne) Department of Mathematical Sciences, University of Massachusetts


Lowell, Lowell, MA, USA
E-mail address: Charles Byrne@uml.edu

View publication stats

You might also like