Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Physica D 225 (2007) 219–228

www.elsevier.com/locate/physd

A wavelet-based method for surrogate data generation


Christopher J. Keylock ∗
School of Geography, Earth and Biosphere Institute, University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK

Received 14 April 2006; received in revised form 17 September 2006; accepted 25 October 2006
Available online 28 November 2006
Communicated by C.K.R.T. Jones

Abstract

Hypothesis testing based on surrogate data has emerged as a popular way to test the null hypothesis that a signal is a realisation of a linear
Gaussian, stochastic process. If these surrogates are constrained to the values and power spectrum of the original data there is no need to formulate
a pivotal test statistic. In this paper a method is presented for generating constrained surrogates using a wavelet transform, introducing a threshold
above which wavelet detail coefficients are pinned to their original values. Such surrogates avoid problems of nonstationarity for pseudo-periodic
data and appear to be more robust than conventional approaches for situations where period modulation is corrupting a Gaussian stochastic
process. When used for generating ensemble realisations of a process, the approach used here avoids some of the difficulties of methods based on
simple randomisation of wavelet coefficients.
c 2006 Elsevier B.V. All rights reserved.

Keywords: Surrogate data; Wavelet transform; Constrained realisations; Hypothesis testing

1. Introduction produce realisations of a time-series that preserve the average


local properties of the original data [6] and also generates
The Iterated Amplitude Adjusted Fourier Transform constrained surrogates for use in hypothesis testing. The use of
(IAAFT) algorithm [16] was developed for testing the null the wavelet-based approach is shown to have certain advantages
hypothesis that a time-series is a realisation of a linear, when dealing with pseudo-periodic data and to data that is
Gaussian process and it has become a popular approach [7, possibly period-modulated.
11], improving the convergence of a previous method [12,22– Related wavelet-based methods to that proposed are
24]. The method is based on storing the squared amplitudes currently used in a variety of fields for developing ensemble
of the data series and then undertaking a random sort of the realisations of a phenomenon [1] or for placing confidence
values. The Fourier transform of the random sort is taken limits on an observed response [2–4,9]. Such methods tend
and the squared amplitudes are replaced by those from the to be based on undertaking a wavelet decomposition of a
original data, with the complex phases retained. The Fourier signal, shuffling the wavelet coefficients and then inverting
transform is then inverted and rank-ordering is used to map the transform to produce a new realisation of the original
values in this surrogate series to those in the original series. The phenomenon. Because the wavelet power spectrum may be
modification to the spectral response that results from this rank- proportional to the variance of the wavelet detail coefficients at
ordering adjustment is dealt with by iterating this procedure each dyadic scale, and this is left unchanged by such shuffling,
until no further re-ordering occurs. Hence, the surrogates are random realisations are generated that approximately preserve
constrained to the values of the original series as well as their the Fourier spectrum of the original signal. However, such
behaviour in the Fourier domain. The aim of this paper is methods are not exact, meaning that a constrained realisation
to develop a method that employs the wavelet transform to of the data cannot be obtained. One difficulty with this
approach emerges from a consideration of the mechanics of
the wavelet decomposition, which is based on the dilation and
∗ Tel.: +44 113 343 3307; fax: +44 113 343 3308. translation of a wavelet function that is convolved with a signal
E-mail address: c.j.keylock@leeds.ac.uk. f (t):

c 2006 Elsevier B.V. All rights reserved.


0167-2789/$ - see front matter
doi:10.1016/j.physd.2006.10.012
220 C.J. Keylock / Physica D 225 (2007) 219–228
Z +∞ 1

t −u

Wu,s = f (t) √ ψ dt. (1)
−∞ j s
As the scale s increases, the variability of f at lower
frequencies is captured, leading to a set of wavelet detail
coefficients whose own periodicity is proportional to s.
Hence, a difficulty with a simple shuffling of the wavelet
coefficients becomes apparent. The inherent periodicity of
the wavelet coefficients at a selected scale is no longer
preserved, meaning that an inverse wavelet transform is
applied to a set of coefficients that were not an admissible
product of the original wavelet transform. This deficiency
was overcome by [6] who used the Iterated Amplitude
Adjusted Fourier Transform (IAAFT) of [16] to randomise
the wavelet coefficients at each level obtained using the
Maximal Overlap Discrete Wavelet Transform (MODWT) [10].
This preserves the original histogram and the Fourier
spectrum of the processed signal, meaning that admissible
Fig. 1. A time-series X and the standard deviation of the wavelet coefficients
random arrangements of coefficients at each scale can be (σw ) for each dyadic scale.
obtained.
A difficulty with the IAAFT algorithm when generating an error criterion (e.g. least-squares) to minimise the difference
surrogates from pseudo-periodic data has been previously between the original coefficients and the surrogate sets (middle
identified by [19,20]. They found that a nonstationarity was grey line of Fig. 2). After this operation has been performed
introduced into IAAFT surrogates and proposed a surrogate for all detail coefficients, the original approximation was added
generating method known as the PPS method that circumvented back in and the MODWT inverted to give a surrogate time-
this difficulty. However, their surrogates were not constrained series. An iterative procedure identical to the equivalent stages
in the sense of [23] meaning that a pivotal test statistic had of the IAAFT was then used to recover the original values in the
to be derived [18]. The wavelet-based method developed in data and maintain the spectral similarity between time-series
this study also overcomes the nonstationarity problem of the and surrogates. Thus, the wavelet decomposition, IAAFT of
IAAFT algorithm but does yield a constrained realisation (the the detail coefficients, matching and inverse wavelet transform
spectrum and the histogram of values are both identical to serve as a prequel to the standard IAAFT algorithm. The bottom
those in the original data). Hence, there is no need to rely on grey line in Fig. 2 shows the coefficients that would result if the
a pivotal test statistic and a variety of tests for departure from final surrogate were to be decomposed using the MODWT. The
Gaussianity [17] can be employed. An example is provided that WIAAFT localises the structure in the original signal (Fig. 3),
shows that because the wavelet representation better preserves while still randomising nonlinear aspects of the signal. It has
the structure of periodic time series, improved results from been shown [6] that this localisation property has potential for
hypothesis testing may be obtained when a signal is affected detecting changes in the Hurst exponent for signals that may be
by period modulation.
approximated by fractional Brownian motion.
2. Wavelet surrogate generation Rather than using an optimal matching procedure as above,
the wavelet-based surrogate data algorithm presented in this
The wavelet-based IAAFT algorithm or WIAAFT method paper introduces a threshold criterion that is used to ‘pin’
[6] for the time-series in Fig. 1 is outlined graphically in Fig. 2. particular wavelet coefficients at a scale j to the same position
Application of the Maximal Overlap Discrete Wavelet Trans- i in the wavelet domain for both the original and surrogate
form (MODWT) to a time-series of length I = 2 J yields i = data. Hence, in order to distinguish this method from the others
1, . . . , I wavelet detail coefficients Wi at each of j = 1, . . . , J considered in this paper, we term this the Pinned WIAAFT
dyadic scales. Scales j = 1, . . . , 6 for the time-series X of (PWIAAFT). In order to see how this method works, consider
Fig. 1 are given in Fig. 2 as a black line. If we consider each Fig. 4, which shows the normalised wavelet coefficients (mean
of these sets of wavelet coefficients independently, then we can of ∼0 subtracted and division by the standard deviation) for
apply the IAAFT algorithm to each to obtain randomised val- dyadic scale 4 for 256 years of annual sunspot data [26]
ues that retain the original values and periodicities (upper grey given in Fig. 5(a). An IAAFT realisation of these wavelet
line in each plot). Clearly, the regularity of the coefficients is coefficients is shown as a dotted line in Fig. 4 and any
preserved in a way that is not the case when randomising indi- alignment between these two signals is purely coincidental. The
vidual coefficients or performing simple block randomisations, PWIAAFT algorithm approximately aligns the surrogate with
meaning that the variability of the coefficients is on the scale of the data by pinning particular coefficients. The dark grey line
the support of the wavelet, making inversion meaningful. in Fig. 4 shows a shuffled sequence of the coefficients with
The method presented in [6] then also generated the some values ‘pinned’ in place. These are visible as relatively
transpose of this set of values and used circular rotation and smooth parts of this signal associated with local maxima and
C.J. Keylock / Physica D 225 (2007) 219–228 221

Fig. 2. Wavelet coefficients for the first 6 scales of time series X (Fig. 1). Values are displaced from one another by 10 in (e) and 4 in all other plots for clarity. See
the text for an explanation of the stages illustrated here.

Fig. 4. The coefficients Wi,4 for 256 years of sunspot numbers. The original
Fig. 3. The time-series X (black line) from Fig. 1, three WIAAFT surrogates
values are given in black, with an IAAFT surrogate as a dotted line. The text
(light grey) and three IAAFT surrogates (dark grey) each displaced from the
explains the stages of the PWIAAFT algorithm shown here as dark and light
other by 5 units for clarity. The localisation property of the WIAAFT surrogates
grey lines.
is clear.

minima in the original coefficients. The threshold used here (to original. The difference between the top and bottom signals
be discussed below) fixes 51 of 256 coefficients at this scale. in Fig. 4 is given in Fig. 6 in the same units as Fig. 4. The
Altogether, 629 from 2048 detail coefficients are fixed for this locations where there is no difference correspond to the pinned
threshold, with the majority (386 from 512) fixed in scales 3 locations. Undertaking this procedure over all scales yields
and 6, as expected from the wavelet variances in Fig. 5(b). random realisations of each set of detail coefficients that can
After the shuffle stage of the IAAFT algorithm (with the then be inverted with an inverse wavelet transform to produce a
pinned coefficients held in place), we obtain the upper light surrogate signal. Following [6], this acts as a preliminary stage
grey line in Fig. 4 and over a series of iterations, irregularities to the standard IAAFT algorithm, the rank-order matching and
at a frequency higher than that permitted by the support of the subsequent Fourier transform stages of which are then used
wavelet at this scale are removed to give a random realisation to ensure that a constrained surrogate time series [22,23] is
of Wi,4 that respects the original spectrum and histogram of produced. The use of the wavelet transform reduces the range
values as well as the locations of the major fluctuations in the of possible realizations of the surrogates. The degree to which
222 C.J. Keylock / Physica D 225 (2007) 219–228

then the pinned energy is


I X
J
_ 2
X
Kp = W i, j (3)
i=1 j=1

and the threshold t∗ is given by:


Kp K up
t∗ = 1 − = . (4)
K K
Fig. 7 indicates how various properties of four different time
series relate to a choice of t∗ . Each time series consisted of
1024 values and were obtained as explained in Table 1 where ε
is a zero mean and unit variance Gaussian noise. The black line
gives the energy at threshold t∗ as a function of the maximum
squared value for any coefficient K (t∗ )/ max(Wi,2 j ). The black
dotted line gives the proportion of the wavelet coefficients
Fig. 5. Sunspot numbers from 1700 to 1955 (a) together with the wavelet that are fixed as a function of t∗ . Hence, in Fig. 7(a) for
variances at 8 dyadic scales (b). Data courtesy of SIDC, RWC Belgium, World K up /K = t∗ = 0.40 it would appear that ∼15% of the
Data Center for the Sunspot Index, Royal Observatory of Belgium.
coefficients are pinned (containing 60% of the energy) and
that K (t∗ )/ max(Wi,2 j ) is also ∼15%. If a threshold is chosen
at t∗ = 0 then the surrogate and original will be identical.
Thus, it is necessary to choose a threshold that permits sufficient
variability that, upon applying the inverse transform, the ranked
value of the preliminary surrogate signal at position i is not
identical to that for the original signal. This is because the
next stage of the IAAFT algorithm is to replace all values in
the surrogate with those from the original using rank-order
matching. In this regard, the proportion of energy pinned at each
location i (but integrated over all scales j) is of importance and
this is given by K p (i)/K (i), where
J
_ 2
X
K p (i) = W i, j
j=1
(5)
J
X
Fig. 6. The difference between the original set of coefficients Wi,4 given in
K (i) = Wi,2 j .
Fig. 4 and the set obtained using the PWIAAFT algorithm. The units for the j=1
ordinate are those used in Fig. 4.
The solid grey line in Fig. 7 shows hK p (i)/K i i with the
standard deviation given by the grey dotted line.
this occurs varies as a function of the value for the threshold.
The correlation between the original values for AR2b and
As demonstrated below, for reasonable choice of the threshold
AR2p, and values derived from a single surrogate obtained with
this restriction is not sufficient to prevent legitimate hypothesis
varying t∗ are given in Figs. 8 and 9. For t∗ = 0.01 there is
testing.
an almost perfect correlation and little randomisation has taken
place. As t∗ increases, the coherence with the original data is
3. Threshold determination reduced. From Fig. 7 it is clear that, for t∗ = 0.10, twice as
many coefficients are pinned for the AR2b series compared to
Because the wavelet power spectrum is proportional to the AR2p. However, the peaked nature of the energy spectrum for
variance of the Wi at each j, and because hWi i j ∼ 0, Wi,2 j the latter series (Fig. 10) means that the threshold choice of
indicates the local energy at a particular location and scale. t∗ = 0.10 is less effective at randomising the values of the
Hence, the threshold is defined as a function of the squared time series. Thus, this comparison of broad and peaked spectra
values for Wi, j . A measure of total energy K is given by: would suggest that t∗ = 0.10 is an approximate lower bound for
a choice of threshold to ensure that satisfactory randomisation
I X
J
X will occur. Note that this use of a threshold to add “noise” to
K = Wi,2 j . (2)
the surrogate is qualitatively similar to the approach adopted
i=1 j=1
by [19,20] with the difference that our approach operates in
In this paper we base our threshold on the unpinned energy K up . the wavelet domain while their method was applied to the
_
Hence, if the pinned wavelet coefficients are denoted by W i, j embedded time series.
C.J. Keylock / Physica D 225 (2007) 219–228 223

Table 1
The definition of the four time series used in Fig. 7 and the manner in which they were obtained

Signal Description Explanation


AR2b Second order autoregressive process with a broad energy x (t) = 0.8xt−1 − 0.25xt−2 + ε. Obtain 2048 values and discard the first half
spectrum
AR2p Second order autoregressive process with a peaked energy x (t) = 1.59xt−1 − 0.96xt−2 + ε. Obtain 2048 values and discard the first half
spectrum
dq/dt = − (r + s) , dr/dt = q + αr,
Rch Chaotic realisation from the Rössler system [15]
ds/dt = β + t (q − γ ) ;
α = 0.398, β = 2, γ = 4. Integrate for 20 480 steps with a unit of 0.1, discard
the first half and regularly sample 1 in 10 of the remaining values for r .
dq/dt = − (r + s) , dr/dt = q + αr,
Rcy Cyclic realisation from the Rössler system [15]
ds/dt = β + t (q − γ ) ;
α = 0.3909, β = 2, γ = 4. Integrate for 20 480 steps with a unit of 0.1, discard
the first half and regularly sample 1 in 10 of the remaining values for r .

Fig. 7. Properties of four signals described in Table 1 as a function of the threshold t∗ = K up /K . The black solid line is K (t∗ )/ max(Wi,2 j ), the black dotted line
is the number of pinned detail coefficients as a function of the total number of such coefficients, the grey solid line is K p (i) /K i , while the grey dotted line is the

standard deviation of K p (i) /K i .

Table 2
The mean probability per datum of the value for the surrogate replicating that for the data using the first 128 values of the Rch time series

I IAAFT PWIAAFT PWIAAFT PWIAAFT PWIAAFT PWIAAFT


(t∗ = 0.01) (t∗ = 0.03) (t∗ = 0.10) (t∗ = 0.30) (t∗ = 1.00)
128 0.0073 0.4211 0.2530 0.0569 0.0403 0.0061
256 0.0045 0.1347 0.0480 0.0217 0.0234 0.0031
512 0.0031 0.1114 0.0316 0.0095 0.0098 0.0016
1024 0.0008 0.0966 0.0186 0.0080 0.0050 0.0012

The chosen threshold needs to be sufficiently large that the compared to the first 128 in Rch. Table 2 records the average
surrogates are a random realisation of the original data in order over all 128 locations of the proportion of matching occurrences
for hypothesis testing to be robust. Table 2 gives the mean for I varying from 128 to 2048. As expected the values for
similarity between surrogates and the Rch (Table 1) data as a the IAAFT and PWIAAFT (t∗ = 1.00) surrogates are ∼1/I
function of the length of the data I and the manner in which the and while other choices of t∗ lead to much greater values,
surrogates are generated. Surrogates were derived for a time- for all but the smallest datasets and the lowest choices for t∗ ,
series of length I and the first 128 values in the surrogate the probability of a surrogate replicating the data or another
224 C.J. Keylock / Physica D 225 (2007) 219–228

Fig. 8. The relation between values for AR2b and values derived from a surrogate with a varying choice of t∗ . Fig. 8(f) shows an IAAFT surrogate for comparison.

Fig. 9. The relation between values for AR2p and values derived from a surrogate with a varying choice of t∗ . Fig. 9(f) shows an IAAFT surrogate for comparison.

surrogate is extremely low. A comparison with the results for where xi is the value of a datum at time t and ψ is the
the IAAFT surrogates would suggest that it is certainly possible separation distance. A two-tailed test at a particular significance
to analyse legitimately a dataset of length I > 512 for t∗ = level g can be formulated by generating 2/g − 1 surrogates.
0.10. We employed (6) at the 5% level, meaning that 39 surrogates
were generated and the degree of difference between A for
4. Testing the algorithm the data and for surrogates was given both from the ranked
position of the data asymmetry with respect to the surrogates
The first test employed to compare the IAAFT and and deriving z scores for the surrogates by dividing the distance
PWIAAFT algorithms was based on the time asymmetry between the asymmetry of the data and the mean asymmetry
A [17]: of the surrogates by the standard deviation of the surrogates’
D 3 E. D 2 E 32 asymmetry (with |z| > 1.96, indicating a significant difference
A (ψ) = xt − xt−ψ xt − xt−ψ (6) at the 5% level). Each time series (and surrogate) contained
C.J. Keylock / Physica D 225 (2007) 219–228 225

Fig. 10. Power spectra (average of 100 realisations of the signal) and time
Fig. 12. The cyclic path on the Rössler attractor Rcy (black line) together with
series plots for AR2b (black line) and AR2p (grey line). The ordinate in (b)
an IAAFT surrogate (dark grey line) and a PWIAAFT (t∗ = 0.10) surrogate
is standardised by the mean and standard deviation with the values for AR2p
(light grey line). The surrogates are displaced from the data by −10 and −20,
displaced by −5 for clarity.
respectively, for clarity.

ψ = 1, 2, 3, 4, 5. For AR2b IAAFT surrogates, there were 2


rejections at the 10% level for ψ = 1, 2, 3, 4, 5. No incorrect
rejections of the null hypothesis for PWIAAFT surrogates
occurred for either time series at any value of ψ or t∗ . The
pseudo-periodic nature of AR2p would appear to be responsible
for some of the difficulties with the IAAFT algorithm [19,
20] and the introduction of artefacts to an IAAFT surrogate in
comparison to the PWIAAFT method for the pseudo-periodic
Rcy data can be seen in Fig. 12.
This nonstationarity in the surrogates could be a problem
if one wished to preserve additional properties of the signal
for more complex tests. For example, consider the pointwise
Hölder exponent of some function f at a position xi given by
α p (xi ). This is given by the supremum of the set of non-integer
real numbers α that fulfil
Fig. 11. Mean values and ±2 standard deviation error bars for the z scores from f (x) − Px(i) (x) ≤ K |x − xi |α

the time asymmetry test A(ψ = 1, . . . , 5) for the AR2b and AR2p times series. (7)
Results using the IAAFT method are in black and PWIAAFT surrogates are in
grey. For the latter, the horizontal line is t∗ = 0.10, the circle is t∗ = 0.20, and where K > 0 and Px(i) (x) is a polynomial with a degree less
the diamond is t∗ = 0.30. Values are displaced horizontally a small distance than α [8]. Fig. 13 gives α p (xi ) based on a moving window
from the respective integer value of ψ for clarity. with a width of 32 values for the data and surrogates in Fig. 12
using the method of [5]. The reduction in the variance for the
1024 values and the testing process was repeated 50 times
second half of the IAAFT surrogates in Fig. 12 has an impact
in order to determine the robustness of the results. The data
on the values for α p (xi ), which are smoother than average in
series used were AR2b and AR2p for which the null hypothesis
the second half of Fig. 13(b) and more intermittent in the first
should be true.
Fig. 11 shows the results for AR2b and AR2p based on z half. If one is attempting to use a surrogate-based algorithm
scores. No rejections of the null hypothesis appear to occur for generating ensembles of a process that meet particular
at the 5% level if we assume normality of the surrogates. criteria, which includes the stationarity of the pointwise Hölder
However, while for AR2b the results for IAAFT surrogates exponents, then the PWIAAFT method may be preferable to the
and those for t∗ = 0.10 are similar, there is a significant IAAFT approach.
difference between the IAAFT and PWIAAFT surrogates for In order to complement the results above based on values
AR2p, with the IAAFT surrogates showing a clear tendency to for A, we also undertook statistical tests for the difference in
lie further from the data. Testing based on rank ordering for the maximal Lyapunov exponent λ. In this case we compared
AR2p IAAFT surrogates gave a significant difference at the 5% the Rch and Rcy signals defined in Table 1. Our method for
level for one realisation from 50 of the AR2p process at ψ = 4. calculating λ followed [14] and reconstructed the trajectory
However, there were a total of 9 rejections at the 10% level for based on the method of delays [21]. We form M = n −
226 C.J. Keylock / Physica D 225 (2007) 219–228

t∗ = 0.10 is sufficient to produce surrogates that have lost their


chaotic structure, enabling hypothesis testing to be undertaken
successfully.

5. Power of the surrogates with respect to period


modulation

The AR2p signal is actually the same as that used by [25]


for testing the power of surrogate-based tests with respect to
nonstationarity. Following [25], the parameters a1 and a2 of an
AR2 process
x (t) = a1 xt−1 + a2 xt−2 + a3 ε (11)
can be interpreted in terms of a damped oscillator, with period
T and relaxation time τ :
a1 = 2 cos [2π/T ] exp(−1/τ ) (12)
Fig. 13. Estimates of the pointwise Hölder exponents α p (xi ) for the Rcy signal a2 = − exp(−2/τ ). (13)
(a), the IAAFT surrogate (b) and the PWIAAFT surrogate (c) given in Fig. 12.
The parameter sets given in Table 1 (a1 = 0.8, a2 = −0.25,
(d∗ − 1) L vectors Xi each containing the values a3 = 1.0) and (a1 = 1.586, a2 = −0.961, a3 = 1.0)
lead to values of (T = 9.018, τ = 1.443) and (T = 10,
τ = 50), respectively. In the former case the relaxation time

Xi = xi xi+L · · · xi+(d∗ −1)L (8)
is much shorter than the period, resulting in a broad spectrum
where n is the number of values in the time series, d∗ is the with no dominant periodicity, while in the latter case, a much
embedding dimension (considered to be a variable), and L is the stronger periodicity can be observed (Fig. 10), corresponding to
lag (taken here to be the time where the autocorrelation function T . Period-based modulation was undertaken by subjecting the
drops to 1 − 1/e, [14]). For each point in the embedding space mean period T of the AR2p signal to a sinusoidal fluctuation of
Xi given by (8) we determined its nearest neighbour with the the form:
additional constraint that the points were separated in time by
an amount greater than the mean period of the time series TTH T (t) = T + MT sin(t2π/Tmod ) (14)
(the Theiler correction) which was determined from the Fourier where Tmod was given a value of 250 and MT was varied from 0
transform of the series by to 4.5 in units of 1.5. Eq. (14) introduces a temporal dependency
Z Z in a1 :
TTH = E ( f)d f f E ( f)d f (9)
a1 (t) = 2 cos[2π/T (t)] exp(−1/τ ). (15)
where f is frequency and E( f ) is the spectral energy. For However, (14) also introduces a time dependency to the
a particular target position in the embedding space Xi the variance, which was compensated for using:
distance to the nearest neighbour is
a32
a3 (t)2 =
Di (0) = min kXk − Xi k (10) 1 − a12 − a22 − 2a12 a2 / (1 − a2 )
!
where kk is the Euclidean norm and k 6= i. The maximal 2a1 (t)2 a2
Lyapunov exponent λ was then be found by tracking the × 1 − a1 (t) 2
− a22 − . (16)
1 − a2
increase in Di with time as these two neighbouring positions
followed their trajectories in the embedding space. Averaging We generated 39 surrogates using the IAAFT and PWIAAFT
the natural logarithm of these distances over all possible k algorithms and compared the output in terms of A(ψ =
nearest neighbours was then used to estimate λ from a straight 1, . . . , 15). Fig. 15 shows example time series for different
line fit of hln Dk (t)i against t. The unknown in this approach is choices of MT , while Fig. 16 indicates the resulting z scores
d∗ and Rosenstein et al. note that d∗ must exceed the topological with ±2 standard deviation limits obtained from twenty-five
dimension of the system (i.e. 3 here) to detect chaos because realisations of the test. The critical value of |z| = 1.96 is also
otherwise the chaotic system will be embedded in a space that shown as a dotted line. Because the wavelet-based surrogates
is too small to permit the true dynamics to be determined and preserve the time–frequency structure of the surrogates to a
it will appear stochastic. In this study we report results using certain degree and do not themselves introduce nonstationarity,
a choice of d∗ = 10. Because we expect a higher value for they appear more robust for testing the null hypothesis that
λ for chaotic systems, we used a one-tailed test and generate a signal is a linear, Gaussian stochastic process when period
19 surrogates for a significance level of 5%. Fig. 14 shows our modulation might potentially affect the test. One situation
results for IAAFT and PWIAAFT (t∗ = 0.10) surrogates in where such an effect may be present in a given dataset is in
terms of plots of hln Dk (t)i against t. Clearly, the choice of the analysis of palaeoclimatic time-series [13].
C.J. Keylock / Physica D 225 (2007) 219–228 227

Fig. 14. Graphs of hln Dk (t)i against t for d∗ = 10. The signals Rch (a), (c) and Rcy (b), (d) are analysed using 19 IAAFT (c), (d) and PWIAAFT (t∗ = 0.10) (a),
(b) surrogates.

Fig. 15. Example time series obtained by period modulation of AR2p for Fig. 16. Mean values and ±2 standard deviation error bars for the z scores
varying values of MT . from the time asymmetry test A(ψ = 1, . . . , 15) for the AR2p times series
with varying degrees of period modulation (Fig. 15). Results using the IAAFT
6. Conclusion method are in black and PWIAAFT surrogates are in grey.

A method for generating surrogate data in the wavelet


domain has been presented. This applied the Iterated surrogates. We present evidence that suggests that tests of the
Amplitude Adjusted Fourier Transform algorithm to the null hypothesis that data are generated by a Gaussian stochastic
wavelet detail coefficients at each scale to avoid problems
process can be accomplished successfully using tests based on
associated with block or coefficient-based randomisation
time asymmetry and the calculation of the maximal Lyapunov
methods that introduce variability at a frequency greater
exponent. In particular, we note that this method appears to be
than the support of the wavelet. The new method, which
we term the Pinned Wavelet Iterated Amplitude Adjusted more robust to period modulation of a Gaussian process than
Fourier Transform Method (PWIAAFT), introduces a threshold the standard IAAFT approach and does not introduce artefacts
that fixes particular coefficients in place, while randomizing to pseudoperiodic data that may potentially invalidate the use
the remainder to provide some temporal localization of the of the IAAFT algorithm for testing particular hypotheses.
228 C.J. Keylock / Physica D 225 (2007) 219–228

Acknowledgements [11] D. Poggi, A. Porporato, L. Ridolfi, J.D. Albertson, G.G. Katul, Interaction
between large and small scales in the canopy sublayer, Geophys. Res. Lett.
The author kindly acknowledges support from EU 5th 31 (2004) L05102, doi:10.1029/2003GL018611.
[12] C.P. Price, D. Prichard, The nonlinear response of the magnetosphere —
framework project SATSIE (contract number: EVG1-CT2002- 30 October 1978, Geophys. Res. Lett. 20 (1993) 771–774.
00059) and the comments of the two anonymous referees. [13] J.A. Rial, Pacemaking the ice ages by frequency modulation of earth’s
orbital eccentricity, Science 285 (1999) 564–568.
References [14] M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for
calculating largest Lyapunov exponents from small data sets, Physica D
65 (1993) 117–134.
[1] C. Angelini, D. Cava, G. Katul, B. Vidakovic, Resampling hierarchical [15] O.E. Rössler, An equation for continuous chaos, Phys. Lett. A 35 (1976)
processes in the wavelet domain: A case study using atmospheric 397–398.
turbulence, Physica D 207 (2005) 27–40. [16] T. Schreiber, A. Schmitz, Improved surrogate data for nonlinearity tests,
[2] M. Breakspear, M. Brammer, P.A. Robinson, Construction of multivariate Phys. Rev. Lett. 77 (1996) 635–638.
surrogate sets from nonlinear data using the wavelet transform, Physica D [17] T. Schreiber, A. Schmitz, Discrimination power of measures for
182 (2003) 1–22. nonlinearity in a time series, Phys. Rev. E 55 (1997) 5443–5447.
[3] M. Breakspear, M.J. Brammer, E.T. Bullmore, P. Das, L.M. Williams, [18] M. Small, K. Judd, Correlation dimension: A pivotal statistic for
Spatiotemporal wavelet resampling for functional neuroimaging data, non-constrained realizations of composite hypotheses in surrogate data
Hum. Brain Mapp. 23 (2004) 1–25. analysis, Physica D 120 (1998) 386–400.
[4] E. Bullmore, C. Long, J. Suckling, J. Fadili, G. Calvert, F. Zelaya, T.A. [19] M. Small, C.K. Tse, Applying the method of surrogate data to cyclic time
Carpenter, M. Brammer, Colored noise and computational inference in series, Physica D 164 (2002) 187–201.
neurophysiological (fMRI) time series analysis: Resampling methods in [20] M. Small, D. Yu, R.G. Harrison, Surrogate test for pseudoperiodic time
time and wavelet domains, Hum. Brain Mapp. 12 (2001) 61–78. series data, Phys. Rev. Lett. 87 (2001) 188101.
[5] K.M. Kolwankar, J. Lévy Véhel, A time domain characterization of the [21] F. Takens, Detecting strange attractors in turbulence, Lect. Notes Math.
fine local regularity of functions, J. Fourier Anal. Appl. 8 (2002) 320–334. 898 (1981) 366–381.
[6] C.J. Keylock, Constrained surrogate time series with preservation of the [22] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for
mean and variance structure, Phys. Rev. E 73 (2006) 036707. nonlinearity in time-series — the method of surrogate data, Physica D 58
[7] K. Lehnertz, R.G. Andrzejak, J. Arnhold, T. Kreuz, F. Mormann, C. Rieke, (1992) 77–94.
G. Widman, C.E. Elger, Nonlinear EEG analysis in epilepsy: Its possible [23] J. Theiler, D. Prichard, Constrained-realization Monte-Carlo method for
use for interictal focus localization, seizure anticipation, and prevention, hypothesis testing, Physica D 94 (1996) 221–235.
J. Clin. Neurophysiol. 18 (2001) 209–222. [24] J. Theiler, P.E. Rapp, Re-examination of the evidence for low-
[8] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999. dimensional, nonlinear structure in the human brain electroencephalo-
[9] R.S. Patel, D. Van De Ville, F.D. Bowman, Determining significant gram, Electroencephalogr. Clin. Neurophysiol. 98 (1996) 213–222.
connectivity by 4D spatiotemporal wavelet packet resampling of [25] J. Timmer, Power of surrogate data testing with respect to nonstationarity,
functional neuroimaging data, NeuroImage 31 (2006) 1142–1155. Phys. Rev. E 58 (1998) 5153–5156.
[10] D.B. Percival, A.T. Walden, Wavelet Methods for Times Series Analysis, [26] R.A.M. Van der Linden and the SIDC team, Online catalogue of the
Cambridge University Press, 2000. sunspot index. http://www.sidc.be/sunspot-data/.

You might also like