Prewhitening PDF

Prewhitening
What is Prewhitening? Prewhitening is an operation that processes a time series (or some other
data sequence) to make it behave statistically like white noise. The ‘pre’ means that whitening
precedes some other analysis that likely works better if the additive noise is white.
These operations can be viewed in either the time domain or the frequency domain:
1. Make the ACF of the time series appear more like a delta function.
2. Make the spectrum appear flat.
Example data sets that may require prewhitening:
1. A well behaved noise process with an additive low frequency (or polynomial) trend added to it.
2. A deterministic signal with an additive red-noise process.
Viewed in the frequency domain, prewhitening means that the dynamic range of the measured data
is reduced.
1
Why bother? Recall from our discussions of spectral analysis the issues of leakage and bias. These
arise from sidelobes inherent to spectral estimation. We can minimize leakage in two ways: (1)
make sidelobes smaller and (2) minimize the power that is prone to leaking into sidelobes. Spectral
windows address the former while prewhitening mitigates the latter. Leakage into sidelobes also
constitutes bias in spectral estimates. However bias appears in other data analysis procedures.
Consider least-squares fitting of a sinusoid to a signal of the form
x(t) = A cos(ωt + φ) + r(t) + n(t),
where n(t) is WSS white noise and r(t) is red noise with a steep power spectrum. Red noise
can strongly bias fitting of a model x̂(t) = Â cos(ω̂t + φ̂) because its power can leak across the
underlying spectrum causing a least-squares fit to give highly discrepant values of Â, ω̂, and φ̂.
Prewhitening of the time series ideally would yield a transformed time series of the form
x0(t) = A0 cos(ωt + φ) + n0(t)
to which fitting a sinusoidal model will be less biased.
2
Procedures:
We have already seen one analysis that is related to prewhitening: the matched filter (MF). The
MF doesn’t whiten the spectrum of the output but it does weight the frequency components of the
measured quantity to maximize the S/N of the signal.
The signal model in this case is x(t) = a0A(t) + n(t). Recall for an arbitrary spectrum Sn(f ) for
additive noise that the frequency-domain MF for a signal A(t) is
Ã(f )
h̃(f ) ∝ .
Sn(f )
Taking equality for simplicity, when the filter is applied to the measurements x(t), we have
∗ a0|Ã(f )|2 ñ(f )Ã∗(f )

ỹ(f ) = x̃(f )h̃ (f ) ∝ + .
Sn(f ) Sn(f )
This means that the ensemble-average spectrum of the filter output is

2
a20 |Ã(f )|4 h|ñ(f )|2 i|Ã(f )|2
|ỹ(f )| = +
Sn2 (f ) Sn2 (f )
a20 |Ã(f )|4 Sn (f )|Ã(f )|2
= +
Sn2 (f ) Sn2 (f )
a20 |Ã(f )|4 |Ã(f )|2
= +
Sn2 (f ) Sn (f )
" #
2 2 2
|Ã(f )| a0 |Ã(f )|
= +1
Sn (f ) Sn (f )
3
Signals with trends: A common situation is where a quantity of the form a0A(t) + n(t) is super-
posed with a strong trend, such as a baseline variation. Similar issues arise in measurements of
spectra.
Consequences of trends include:
1. Bias in estimating parameters of A(t − t0) or its spectral analog A(ν − ν0).
2. Erroneous estimates of cross correlations between two time series such as
x(t) = s1(t) + n1(t) and y(t) = s2(t) + n2(t),
where s1,2 are signals of interest and n1,2 are measurement errors. I.e. we may be interested in
the correlation
1 X 1 X
C= s1(t)s2(t) or C= [s1(t) − s1][s2(t) − s2]
Nt t Nt t
P
where s1,2 = (1/Nt) t s1,2(t) are the sample means.
If there are trends p1,2(t) added to x(t) and y(t) the correlation Ĉ of x and y used to estimate C may
be dominated completely by the trends and not the signal parts of the measurements.
A fix: Trends can often be modeled as a polynomial of some order that can be fitted to the mea-
surements. The order of the polynomial needs to be chosen ‘wisely.’ For a pulse or spectral line
confined to some range of t or ν this is straight forward. But for a detection problem where the
signal location is not known, the situation is very tricky.
4
Prewhitening filter: Consider again x(t) = a0A(t) + n(t) and let’s trivially construct a frequency-
domain filter that whitens the measurements.
pfrequency domain. Let y(t) = x(t) ⊗ h(t)

We want a filter h(t) that flattens the noise n(t) in the
where ⊗ means convolution. All we need is h̃(f ) = Sn(f ). Then the ensemble spectrum of the
output ỹ(f ) is
h|ỹ(f )|2i = h|x̃(f )|2ih|h̃(f )|2i
h|x̃(f )|2i
=
Sn(f )
a20h|Ã(f )|2i
= +1
Sn(f )
Note how this differs from the result for a matched filter. But the result is that in the mean the
spectrum of the additive noise has been flattened.
Prewhitening is important in both detection and estimation applications.
5
Prewhitening in the least-squares estimation context:
Consider our standard linear model
y = Xθ + n,
which has a least-squares solution for the parameter vector
† −1
−1 † −1
θ = X Cn X X Cn y,
where the covariance matrix of the noise vector n is
Cn = hnn†i.
This is also the maximum likelihood solution in the right circumstances (which are?).
As with any covariance matrix, Cn is Hermitian and positive, semi-definite. This means that the
quadratic form for an arbitrary vector z satisfies
z†Cnz ≥ 0.
Such matrices can always be factored according to the Cholesky decomposition:
Cn = LL†
where L is a lower-diagonal matrix; e.g.
 
a 0 0 0
b c 0 0
 
L= .
d e f 0
g h i j
6
Utility: we can transform the model as follows using L:
y = Lyw
X = LXw .
Substituting into the solution vector for θ and using
−1
y† = (Lyw )† = yw† L†, X† = (LXw )† = X†w L†, and C−1 † −1
n = (LL ) = L† L−1
yields
† −1
−1 † −1
θ = X Cn X X Cn y
= (X†w L†C−1
n L Xw )−1X†w L†C−1 Ly
| {z } | {zn }
≡I ≡I
†
−1 †
= Xw Xw Xw y.
So what? The solution is identical to the least-squares case where the noise covariance matrix is
diagonal; i.e. the noise vector nw = L−1n has been transformed to white noise. We have whitened
the data.
When is this useful? An example is the fitting of a sinusoidal function amid red noise where
leakage effects are important just as they are for spectral analysis. A specific example is the fitting
of astrometric parameters or periodicities in radial velocity data.
What’s the catch? You need to know the covariance matrix of the noise n to do the Cholesky
decomposition. This can be easier said than done!
7
Examples of sine wave + red and white noise
Examples were generated with a signal
y(t) = cos(2πt/P + φ) + r(t)/snrr + w(t)/snrw
where r, w have unit variance and are scaled by the signal to noise ratios snrr and snrw , respectively.
The covariance matrix for the combined noise n = r + w was calculated by averaging Cn = hnn†i
over 1000 realizations.
Note that for some real situations where we have only a single time series, we would need to
calculate Cn differently, e.g. from first principles, prior knowledge, etc.
In practice, realizations of r were generated and the mean subtracted. Then white noise was added
to form n and then the Cholesky decomposition was done using the command
L = scipy.linalg.cholesky(Cn, lower=True)
For data vectors of length N , the lower-diagonal matrix L is N × N . If the mean had been sub-
tracted from the white noise as well, the rank of the covariance matrix would be N − 1 and the
decomposition would fail.
Results in the following figures indicate that
1. Power-law red noise with spectral indices si <

∼ 2 do not benefit particularly from whitening
because leakage is much less.
8
2. What matters is the signal to noise ratio of the cosine to the signal contained in one resolution
bandwidth ∆f ∼ T −1 centered on the frequency of the sinusoid. For a steep power law, only a
small fraction of the total power in the red noise is in this band whereas the flatter the spectrum,
the larger this fraction is.
9
Cholesky whitening: N =256 Sine+RN+WN Si = 1.0 S/Nr = 0.01 S/Nw = 1.00
Time Series Spectra
200 103
150 102
100 101
Signal + Noise 50 100
10−1
0
10−2
−50
10−3
−100 10−4
−150 10−5
−200 10−6
0 50 100 150 200 250 100 101 102
200 103
150 102
100 101
100
Noise only
50
10−1
0
10−2
−50
10−3
−100 10−4
−150 10−5
−200 10−6
0 50 100 150 200 250 100 101 102
Time (bins) Frequency (bins)
Figure 1: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins with
additive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)
and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original and
whitened time series. Bottom right: power spectra of original and whitened noise sequences.
10
Time Series Spectra
6 100
4
10−1
Signal + Noise
2
10−2
0
10−3
−2
−4 10−4
−6 10−5
0 50 100 150 200 250 100 101 102
4 10−1
3
10−2
2
Noise only
1 10−3
0
−1 10−4
−2
10−5
−3
−4 10−6
0 50 100 150 200 250 100 101 102
11
Time Series Spectra
100 103
80 102
60 101
20 10−1
0 10−2
−20 10−3
−40 10−4
−60 10−5
0 50 100 150 200 250 100 101 102
100 103
80 102
60 101
Noise only
40 100
20 10−1
0 10−2
−20 10−3
−40 10−4
−60 10−5
0 50 100 150 200 250 100 101 102
12
Time Series Spectra
15 102
10 101
100
Signal + Noise
5
10−1
0
10−2
−5
10−3
−10
10−4
−15 10−5
−20 10−6
0 50 100 150 200 250 100 101 102
15 102
10 101
5 100
Noise only
0 10−1
−5 10−2
−10 10−3
−15 10−4
−20 10−5
0 50 100 150 200 250 100 101 102
13
Time Series Spectra
5 100
4
10−1
3
Signal + Noise
2 10−2
1
10−3
0
−1 10−4
−2
10−5
−3
−4 10−6
0 50 100 150 200 250 100 101 102
5 100
4
3 10−1
2
Noise only
10−2
1
0
10−3
−1
−2 10−4
−3
−4 10−5
0 50 100 150 200 250 100 101 102
14
Time Series Spectra
40 103
30 102
20 101
Signal + Noise
10
100
0
10−1
−10
−20 10−2
−30 10−3
−40 10−4
−50 10−5
0 50 100 150 200 250 100 101 102
40 103
30 102
20 101
10
Noise only
100
0
10−1
−10
−20 10−2
−30 10−3
−40 10−4
−50 10−5
0 50 100 150 200 250 100 101 102
15
Time Series Spectra
150 104
103
100
102
100
0
10−1
−50 10−2
10−3
−100
10−4
−150 10−5
0 50 100 150 200 250 100 101 102
150 104
103
100
102
50 101
Noise only
100
0
10−1
−50 10−2
10−3
−100
10−4
−150 10−5
0 50 100 150 200 250 100 101 102
16
Time Series Spectra
15 102
10 101
100
Signal + Noise
5
10−1
0 10−2
−5 10−3
10−4
−10
10−5
−15 10−6
0 50 100 150 200 250 100 101 102
15 102
10 101
100
5
Noise only
10−1
0 10−2
−5 10−3
10−4
−10
10−5
−15 10−6
0 50 100 150 200 250 100 101 102
17
Time Series Spectra
6 100
4 10−1
Signal + Noise
2 10−2
0 10−3
−2 10−4
−4 10−5
−6 10−6
0 50 100 150 200 250 100 101 102
4 10−1
3
2 10−2
1
Noise only
10−3
0
−1
10−4
−2
−3 10−5
−4
−5 10−6
0 50 100 150 200 250 100 101 102
18
Impulse Response and Spectrum of Whitening Filter
We can think of the Cholesky decomposition as a filter that suppresses low frequencies for the
purpose of estimating the parameters of a sinusoid. The filter response can be calculated from the
impulse response as follows:
Construct a data vector i corresponding to ij = 0 for all j except j = j0 where ij0 = 1.
Then the impulse response is h = L−1i. Then, expressed as a time function hj , j = 1, · · · , N , the
frequency-domain response is the squared magnitude of the DFT of hj :
H̃k = |h̃k |2
19
Figure 10: Example of whitening using the Cholesky decomposition along with the impulse response and its spectrum. The signal consists of
a sine wave with period of 10.23 time bins with additive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise
are given. Left figure: Top left: original time series (red) and whitened time series (black). Bottom left: original noise (red) and whitened
noise (black). Top right: power spectra of the original and whitened time series. Bottom right: power spectra of original and whitened noise
sequences. Right figure: Top panel: input impulse (red) and impulse response of the Cholesky filter. Bottom Panel: Spectra of the impulse
and impulse response, respectively. The filter shows the suppression of frequencies below about 25 bins; this frequency is signal-to-noise ratio
dependent.
20

Prewhitening PDF

Uploaded by

Copyright:

Available Formats

You might also like

Prewhitening PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prewhitening PDF

Uploaded by

Copyright:

Available Formats

Prewhitening

Example data sets that may require prewhitening:

∗ a0|Ã(f )|2 ñ(f )Ã∗(f )

pfrequency domain. Let y(t) = x(t) ⊗ h(t)

1. Power-law red noise with spectral indices si <

You might also like