Prony

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/358044797
PRONY'S METHOD IN SIGNAL ANALYSIS
Preprint · January 2022
CITATIONS READS
0 447
1 author:
Charles L. Byrne
University of Massachusetts Lowell
247 PUBLICATIONS 6,787 CITATIONS
SEE PROFILE
All content following this page was uploaded by Charles L. Byrne on 23 January 2022.
The user has requested enhancement of the downloaded file.

PRONY’S METHOD IN SIGNAL ANALYSIS
CHARLES L. BYRNE
Abstract. In his 1795 paper Prony presented a method for analyzing a

sum of exponential functions. While this method works in theory it can
fail in the presence of noise or modeling errors. In this note I present
Prony’s method, relate it to modern-day stochastic spectral analysis,
and use this analogy to suggest improvements to Prony’s method.
1. Prony’s Problem
The problem Prony solved in [5] is to determine J and the complex num-
bers aj and γj from finitely many values of the function
J
X
(1.1) f (t) = aj etγj .
j=1
If we take the γj = iωj to be imaginary, f (t) becomes the sum of complex

sinusoids. If we take γj to be real, then f (t) is the sum of real exponentials,
either increasing or decreasing.
The date of publication of [5] is often taken by editors to be a typograph-
ical error and is replaced by 1995; or, since it is not written in English,
perhaps 1895. But the 1795 date is the correct one. The mathematical
problem Prony solved remains important in signal analysis, and his method
for solving it is still used today [6]. Prony’s method anticipates some of the
eigenvector methods described in [1, 3].
2. Prony’s Solution Method

Suppose that we have the data vector d with entries dn = f (n), for
n = 1, ..., N = 2M , and J ≤ M . We seek a nonzero vector c with entries
ck , k = 0, ..., M such that
M
X
(2.1) 0 = ck f (m + k) = c0 f (m) + c1 f (m + 1) + ... + cM f (m + M ),
k=0
Date: January 23, 2022.

1
2 C. BYRNE
for m = 1, ..., M . In matrix-vector notation, which was unavailable to Prony,

we are solving the linear system
    
d1 d2 ... dM +1 c0 0
 d2 d ... d c
3 M +2   1  0
    

 .   .  .
   =  ,
 .   .  .
    
 .   .  .
dM dM +1 ... dN cM 0
which we write as Dc = 0. Since D† Dc = 0 also, we see that c is an

eigenvector associated with the eigenvalue zero of the hermitian nonnegative
definite matrix D† D, with D† the conjugate transpose of the matrix D.
Fix a value of m and replace each of the f (m + k) in Equation (4.1) with
the value given by Equation (1.1) to get
J M
!
X X
(2.2) 0= aj ck ekγj emγj .
j=1 k=0
For each γ we define the vector e(γ) ∈ CM as

.
(2.3) e(γ) = (eγ , e2γ , ..., eM γ )T .
. P .
With C(z) = M k γj
k=0 ck z and bj = aj C(e ), Equation (4.2) becomes
J
X
(2.4) bj e(γj ) = 0.
j=1
We assume that no collection of the e(γ) with M members is linearly de-

pendent. Therefore, the bj are all zero, from which it follows that either
aj = 0 or C(eγj ) = 0. Therefore, the polynomial C(z) has roots at those
values z = eγj for which aj 6= 0. Once we find the roots of this polynomial
we have the values of eγj . Then we obtain the aj by solving a linear system
of equations. In practice we would not know J so would overestimate J
somewhat in selecting J = M . As a result, some of the aj would be zero.
.
Note that Dm = Rowm (D), the mth row of the matrix D, can be written
as
J
X
m
(2.5) D = (aj emγj )sj ,
j=1
where
.
(2.6) sj = (1, eγj , e2γj , ..., eM γj ).
Therefore the row space of D is spanned by the vectors sj , j = 1, ..., J, and
the rank of D is at most J. Therefore there is a nonzero vector c that is
orthogonal to each of the vectors sj for which aj is not zero.
PRONY 3
3. An Illustration
For the sake of illustration we consider the case of M = 3 and N = 6,
with J ≤ M . Suppose that our data are the real numbers dn , n = 1, ..., 6.
The matrix D is now  
d1 d2 d3 d4
D = d2 d3 d4 d5 
d3 d4 d5 d6
and the matrix D† D = DT D is
DT D =
d21 + d22 + d23
 
d1 d2 + d2 d3 + d3 d4 d1 d3 + d2 d4 + d3 d5 d1 d4 + d2 d5 + d3 d6
d2 d1 + d3 d2 + d4 d3
 d22 + d23 + d24 d2 d3 + d3 d4 + d4 d5 d2 d4 + d3 d5 + d4 d6 
.
d3 d1 + d4 d2 + d5 d3 d3 d2 + d4 d3 + d5 d4 d23 + d24 + d25 d3 d4 + d4 d5 + d5 d6 
d4 d1 + d5 d2 + d6 d3 d4 d2 + d5 d3 + d6 d4 d4 d3 + d5 d4 + d6 d5 d24 + d25 + d26
Note that, for each pair of indices u and v, we have (DT D)u,v a sum of
products for which u − v is constant. Note, however, that the sum is not
over all the pairs for which this is the case.
4. A Stochastic View of Prony’s Problem

Prony’s method works in theory but can fail when there is modeling error
or noise in the data. Because noise is usually modeled in terms of random
variables it is helpful to take a stochastic view of Prony’s problem.
We consider now a stochastic version of the function f (t) in Equation(1.1):
let
J
X
(4.1) f (t) = Aj etγj ,
j=1
where the Aj are complex random variables. We can then view the entries
dn of the random data vector d as instances of a random variable Xn given
by
J
. X
(4.2) Xn = Aj enγj .
j=1
We take as our data M independent samples of the random data vector d,

which we denote by dm , for m = 1, ..., M . Then we denote the nth entry of
the vector dm by
J
X
(4.3) dm
n = am
j e
nγj
,
j=1
where, for each j, the am

j , m = 1, ..., M , are M independent samples of the
random variable Aj . We let S be the matrix whose mth row is the vector
4 C. BYRNE
dm . Then S m , the mth row of the matrix S, has the form

J
X
(4.4) Sm = (am j
j )s .
j=1
When we compare the mth row of the matrix S with the mth row of the
matrix D, as given by Equation (2.5), we see that the coefficients am j in
Equation (4.4) and (aj emγj ) in Equation (2.5) play similar roles. This will
help us when we modify Prony’s method to deal with noise.
We denote by Rs the correlation matrix for d in the noise-free case; that
is,
(4.5) (Rs )k,n = E(dk dn ).
1 †
Then Rs is approximated by M S S.
5. Allowing for Additive Noise

Suppose now that the random variable dn has the form
J
X
(5.1) dn = Aj enγj + zn ,
j=1
where the zn , n = 1, ..., N , are random variables that are independent with
respect to one another and with respect to the random variables Aj , with
means equal to zero and variances equal to σ 2 . This is often described
as data containing additive white noise. Then the expected value of dk dn
becomes
(5.2) (Rd )k,n = E(dk dn ) = (Rs )k,n + σ 2 δ(n − k),
where δ(n − k) = 0, if n 6= k, and equal to one, when n = k. We then have
Rd = Rs + σ 2 I, where I is the identity matrix.
1 †
When the data contains additive white noise the matrix M S S is a statis-
tical estimate of the matrix Rd . Consequently, the contribution of the noise
is primarily to increase the main diagonal of S † S. To the extent that the
matrix D† D is analogous to the matrix S † S we should expect the effect of
additive white noise to be primarily an increase in the values on the main
diagonal of D† D. This suggests that, instead of looking for a nonzero vector
c such that Dc = 0, we should select c to be an eigenvector of D† D corre-
sponding to the smallest eigenvalue of that matrix. It will then follow that
such a vector c should be (nearly) orthogonal to each of the vectors sj for
which aj 6= 0.
6. More General Signal Vectors

In this section we describe a model that extends that of Prony to allow
.
for more general signal vectors. Let S = {e(θ), θ ∈ Θ} ⊆ CN be the col-
lection of all potential signal column vectors, where Θ is some metric space
of parameters, each e(θ) is a unit vector, and no subset of S with N or
PRONY 5
fewer members is linearly dependent. Each measurement vector is a single

realization of the random column vector d in CN , given by
J
X
(6.1) d= Aj e(θj ) + z,
j=1
where the Aj , j = 1, ..., J, are uncorrelated complex random variables with

mean zero, J < N , and the noise z is a complex random vector with possibly
correlated entries. Then the correlation matrix for our measurements is
.
R = Rd given by
J
X
(6.2) R = E(dd† ) = E(|Aj |2 )e(θj )e(θj )† + E(zz† ) = Rs + Q.
j=1
6.1. A Linear Estimator. The expected value of the magnitude squared

of the matched filter e(θ)† d is
.
(6.3) A(θ) = E(|e(θ)† d|2 ) = e(θ)† Re(θ).
We think of A(θ) as a “linear” estimator because it does not involve the

inverse of R, nor explicit calculation of the eigenvectors and eigenvalues of
R. In practice we estimate R by averaging several realizations of the random
matrix dd† .
6.2. Prewhiten, then Match. As I discussed in [2], the best linear unbi-
ased estimator (BLUE) for estimating γ, when d is the random data vector
d = γs + z, Q = E(zz† ), and x is the vector of measurements, is

(6.4) γ̂ = 1/s† Q−1 s s† Q−1 x .
With Q = CC † we can write

†
(6.5) γ̂ = 1/s† Q−1 s C −1 s C −1 x.
Multiplying by C −1 is called “prewhitening” , so the optimal estimator in-

volves prewhitening, followed by a matched filter.
6.3. Modifying A(θ) Using Prewhitening. With s = e(θ) the prewhitened

. .
signal becomes t(θ) = C −1 e(θ), the prewhitened data vector becomes g =
.
C −1 d, and the prewhitened correlation matrix becomes P = C −1 R(C −1 )† =
E(|gg† |2 ). In place of the estimator A(θ) we have
.
(6.6) B(θ) = E(|t(θ)† g|2 ) = e(θ)† Q−1 RQ−1 e(θ).
6 C. BYRNE
6.4. Capon’s Estimator. When N is not large and some of the θj are
close to one another the functions A(θ) and B(θ) may not be able to resolve
these closely spaced components of R [1]. To improve resolution we can turn
to high-resolution methods.
As we saw in [2], the BLUE is based on finding the vector b that minimizes
b† Qb, subject to b† e(θ) = 1. The problem with this approach is that we
cannot determine Q from measurements and only know Q from theoretical
models. Capon [4] suggests finding, for fixed θ, the vector h(θ) = h that
minimizes h† Rh, subject to h† e(θ) = 1. The vector h(θ) is then
1
(6.7) h(θ) = R−1 e(θ).
e(θ) R−1 e(θ)
†
The idea here is that the filter h(θ) suppresses every component of the data
that is not a multiple of e(θ). This includes, but is not limited to, the
background noise. Capon’s estimator is then the function of θ defined by
E(|h(θ)† d|2 ) and given by
. 1
(6.8) C(θ) = .
e(θ) R−1 e(θ)
†
When the fixed θ is not one of the signal parameters C(θ) is relatively
small. This leads to improved resolution, since when θ lies between two
actual signal parameters the value of C(θ) will typically be smaller than its
value at either of the two actual signal parameters. However, as I discussed
in [1], this improved resolution can be lost when the data is perturbed by
phase errors.
References
[1] Byrne, C. (2021) “Noise in high-resolution signal processing.” , posted on Research-
Gate July 20, 2021.
[2] Byrne, C. (2021) “Modified inner products in signal detection.” , posted on Research-
Gate August 6, 2021.
[3] Byrne, C. (2021) “Avoiding prewhitening through dimensionality reduction in array
processing.” , posted on ResearchGate August 9, 2021.
[4] Capon, J. (1969) “High-resolution frequency-wavenumber spectrum analysis.”Proc.
of the IEEE 57, pp. 1408–1418.
[5] Prony, G.R.B. (1795) “Essai expérimental et analytique sur les lois de la dilatabilité
de fluides élastiques et sur celles de la force expansion de la vapeur de l’alcool, à
différentes températures.”Journal de l’Ecole Polytechnique (Paris) 1(2), pp. 24–76.
[6] Therrien, C. (1992) Discrete Random Signals and Statistical Signal Processing. En-
glewood Cliffs, NJ: Prentice-Hall.
(C. Byrne) Department of Mathematical Sciences, University of Massachusetts

Lowell, Lowell, MA, USA
E-mail address: Charles Byrne@uml.edu
View publication stats

Prony

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prony

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

PRONY'S METHOD IN SIGNAL ANALYSIS

Preprint · January 2022

The user has requested enhancement of the downloaded file.

Abstract. In his 1795 paper Prony presented a method for analyzing a

If we take the γj = iωj to be imaginary, f (t) becomes the sum of complex

2. Prony’s Solution Method

Date: January 23, 2022.

for m = 1, ..., M . In matrix-vector notation, which was unavailable to Prony,

which we write as Dc = 0. Since D† Dc = 0 also, we see that c is an

For each γ we define the vector e(γ) ∈ CM as

We assume that no collection of the e(γ) with M members is linearly de-

4. A Stochastic View of Prony’s Problem

We take as our data M independent samples of the random data vector d,

where, for each j, the am

dm . Then S m , the mth row of the matrix S, has the form

5. Allowing for Additive Noise

6. More General Signal Vectors

fewer members is linearly dependent. Each measurement vector is a single

where the Aj , j = 1, ..., J, are uncorrelated complex random variables with

6.1. A Linear Estimator. The expected value of the magnitude squared

We think of A(θ) as a “linear” estimator because it does not involve the

With Q = CC † we can write

Multiplying by C −1 is called “prewhitening” , so the optimal estimator in-

6.3. Modifying A(θ) Using Prewhitening. With s = e(θ) the prewhitened

(C. Byrne) Department of Mathematical Sciences, University of Massachusetts

View publication stats

You might also like