Professional Documents
Culture Documents
A Wavelet-Based Method For Surrogate Data Generation: Christopher J. Keylock
A Wavelet-Based Method For Surrogate Data Generation: Christopher J. Keylock
www.elsevier.com/locate/physd
Received 14 April 2006; received in revised form 17 September 2006; accepted 25 October 2006
Available online 28 November 2006
Communicated by C.K.R.T. Jones
Abstract
Hypothesis testing based on surrogate data has emerged as a popular way to test the null hypothesis that a signal is a realisation of a linear
Gaussian, stochastic process. If these surrogates are constrained to the values and power spectrum of the original data there is no need to formulate
a pivotal test statistic. In this paper a method is presented for generating constrained surrogates using a wavelet transform, introducing a threshold
above which wavelet detail coefficients are pinned to their original values. Such surrogates avoid problems of nonstationarity for pseudo-periodic
data and appear to be more robust than conventional approaches for situations where period modulation is corrupting a Gaussian stochastic
process. When used for generating ensemble realisations of a process, the approach used here avoids some of the difficulties of methods based on
simple randomisation of wavelet coefficients.
c 2006 Elsevier B.V. All rights reserved.
Fig. 2. Wavelet coefficients for the first 6 scales of time series X (Fig. 1). Values are displaced from one another by 10 in (e) and 4 in all other plots for clarity. See
the text for an explanation of the stages illustrated here.
Fig. 4. The coefficients Wi,4 for 256 years of sunspot numbers. The original
Fig. 3. The time-series X (black line) from Fig. 1, three WIAAFT surrogates
values are given in black, with an IAAFT surrogate as a dotted line. The text
(light grey) and three IAAFT surrogates (dark grey) each displaced from the
explains the stages of the PWIAAFT algorithm shown here as dark and light
other by 5 units for clarity. The localisation property of the WIAAFT surrogates
grey lines.
is clear.
minima in the original coefficients. The threshold used here (to original. The difference between the top and bottom signals
be discussed below) fixes 51 of 256 coefficients at this scale. in Fig. 4 is given in Fig. 6 in the same units as Fig. 4. The
Altogether, 629 from 2048 detail coefficients are fixed for this locations where there is no difference correspond to the pinned
threshold, with the majority (386 from 512) fixed in scales 3 locations. Undertaking this procedure over all scales yields
and 6, as expected from the wavelet variances in Fig. 5(b). random realisations of each set of detail coefficients that can
After the shuffle stage of the IAAFT algorithm (with the then be inverted with an inverse wavelet transform to produce a
pinned coefficients held in place), we obtain the upper light surrogate signal. Following [6], this acts as a preliminary stage
grey line in Fig. 4 and over a series of iterations, irregularities to the standard IAAFT algorithm, the rank-order matching and
at a frequency higher than that permitted by the support of the subsequent Fourier transform stages of which are then used
wavelet at this scale are removed to give a random realisation to ensure that a constrained surrogate time series [22,23] is
of Wi,4 that respects the original spectrum and histogram of produced. The use of the wavelet transform reduces the range
values as well as the locations of the major fluctuations in the of possible realizations of the surrogates. The degree to which
222 C.J. Keylock / Physica D 225 (2007) 219–228
Table 1
The definition of the four time series used in Fig. 7 and the manner in which they were obtained
Fig. 7. Properties of four signals described in Table 1 as a function of the threshold t∗ = K up /K . The black solid line is K (t∗ )/ max(Wi,2 j ), the black dotted line
is the number of pinned detail coefficients as a function of the total number of such coefficients, the grey solid line is K p (i) /K i , while the grey dotted line is the
Table 2
The mean probability per datum of the value for the surrogate replicating that for the data using the first 128 values of the Rch time series
The chosen threshold needs to be sufficiently large that the compared to the first 128 in Rch. Table 2 records the average
surrogates are a random realisation of the original data in order over all 128 locations of the proportion of matching occurrences
for hypothesis testing to be robust. Table 2 gives the mean for I varying from 128 to 2048. As expected the values for
similarity between surrogates and the Rch (Table 1) data as a the IAAFT and PWIAAFT (t∗ = 1.00) surrogates are ∼1/I
function of the length of the data I and the manner in which the and while other choices of t∗ lead to much greater values,
surrogates are generated. Surrogates were derived for a time- for all but the smallest datasets and the lowest choices for t∗ ,
series of length I and the first 128 values in the surrogate the probability of a surrogate replicating the data or another
224 C.J. Keylock / Physica D 225 (2007) 219–228
Fig. 8. The relation between values for AR2b and values derived from a surrogate with a varying choice of t∗ . Fig. 8(f) shows an IAAFT surrogate for comparison.
Fig. 9. The relation between values for AR2p and values derived from a surrogate with a varying choice of t∗ . Fig. 9(f) shows an IAAFT surrogate for comparison.
surrogate is extremely low. A comparison with the results for where xi is the value of a datum at time t and ψ is the
the IAAFT surrogates would suggest that it is certainly possible separation distance. A two-tailed test at a particular significance
to analyse legitimately a dataset of length I > 512 for t∗ = level g can be formulated by generating 2/g − 1 surrogates.
0.10. We employed (6) at the 5% level, meaning that 39 surrogates
were generated and the degree of difference between A for
4. Testing the algorithm the data and for surrogates was given both from the ranked
position of the data asymmetry with respect to the surrogates
The first test employed to compare the IAAFT and and deriving z scores for the surrogates by dividing the distance
PWIAAFT algorithms was based on the time asymmetry between the asymmetry of the data and the mean asymmetry
A [17]: of the surrogates by the standard deviation of the surrogates’
D 3 E. D 2 E 32 asymmetry (with |z| > 1.96, indicating a significant difference
A (ψ) = xt − xt−ψ xt − xt−ψ (6) at the 5% level). Each time series (and surrogate) contained
C.J. Keylock / Physica D 225 (2007) 219–228 225
Fig. 10. Power spectra (average of 100 realisations of the signal) and time
Fig. 12. The cyclic path on the Rössler attractor Rcy (black line) together with
series plots for AR2b (black line) and AR2p (grey line). The ordinate in (b)
an IAAFT surrogate (dark grey line) and a PWIAAFT (t∗ = 0.10) surrogate
is standardised by the mean and standard deviation with the values for AR2p
(light grey line). The surrogates are displaced from the data by −10 and −20,
displaced by −5 for clarity.
respectively, for clarity.
Fig. 14. Graphs of hln Dk (t)i against t for d∗ = 10. The signals Rch (a), (c) and Rcy (b), (d) are analysed using 19 IAAFT (c), (d) and PWIAAFT (t∗ = 0.10) (a),
(b) surrogates.
Fig. 15. Example time series obtained by period modulation of AR2p for Fig. 16. Mean values and ±2 standard deviation error bars for the z scores
varying values of MT . from the time asymmetry test A(ψ = 1, . . . , 15) for the AR2p times series
with varying degrees of period modulation (Fig. 15). Results using the IAAFT
6. Conclusion method are in black and PWIAAFT surrogates are in grey.
Acknowledgements [11] D. Poggi, A. Porporato, L. Ridolfi, J.D. Albertson, G.G. Katul, Interaction
between large and small scales in the canopy sublayer, Geophys. Res. Lett.
The author kindly acknowledges support from EU 5th 31 (2004) L05102, doi:10.1029/2003GL018611.
[12] C.P. Price, D. Prichard, The nonlinear response of the magnetosphere —
framework project SATSIE (contract number: EVG1-CT2002- 30 October 1978, Geophys. Res. Lett. 20 (1993) 771–774.
00059) and the comments of the two anonymous referees. [13] J.A. Rial, Pacemaking the ice ages by frequency modulation of earth’s
orbital eccentricity, Science 285 (1999) 564–568.
References [14] M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for
calculating largest Lyapunov exponents from small data sets, Physica D
65 (1993) 117–134.
[1] C. Angelini, D. Cava, G. Katul, B. Vidakovic, Resampling hierarchical [15] O.E. Rössler, An equation for continuous chaos, Phys. Lett. A 35 (1976)
processes in the wavelet domain: A case study using atmospheric 397–398.
turbulence, Physica D 207 (2005) 27–40. [16] T. Schreiber, A. Schmitz, Improved surrogate data for nonlinearity tests,
[2] M. Breakspear, M. Brammer, P.A. Robinson, Construction of multivariate Phys. Rev. Lett. 77 (1996) 635–638.
surrogate sets from nonlinear data using the wavelet transform, Physica D [17] T. Schreiber, A. Schmitz, Discrimination power of measures for
182 (2003) 1–22. nonlinearity in a time series, Phys. Rev. E 55 (1997) 5443–5447.
[3] M. Breakspear, M.J. Brammer, E.T. Bullmore, P. Das, L.M. Williams, [18] M. Small, K. Judd, Correlation dimension: A pivotal statistic for
Spatiotemporal wavelet resampling for functional neuroimaging data, non-constrained realizations of composite hypotheses in surrogate data
Hum. Brain Mapp. 23 (2004) 1–25. analysis, Physica D 120 (1998) 386–400.
[4] E. Bullmore, C. Long, J. Suckling, J. Fadili, G. Calvert, F. Zelaya, T.A. [19] M. Small, C.K. Tse, Applying the method of surrogate data to cyclic time
Carpenter, M. Brammer, Colored noise and computational inference in series, Physica D 164 (2002) 187–201.
neurophysiological (fMRI) time series analysis: Resampling methods in [20] M. Small, D. Yu, R.G. Harrison, Surrogate test for pseudoperiodic time
time and wavelet domains, Hum. Brain Mapp. 12 (2001) 61–78. series data, Phys. Rev. Lett. 87 (2001) 188101.
[5] K.M. Kolwankar, J. Lévy Véhel, A time domain characterization of the [21] F. Takens, Detecting strange attractors in turbulence, Lect. Notes Math.
fine local regularity of functions, J. Fourier Anal. Appl. 8 (2002) 320–334. 898 (1981) 366–381.
[6] C.J. Keylock, Constrained surrogate time series with preservation of the [22] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for
mean and variance structure, Phys. Rev. E 73 (2006) 036707. nonlinearity in time-series — the method of surrogate data, Physica D 58
[7] K. Lehnertz, R.G. Andrzejak, J. Arnhold, T. Kreuz, F. Mormann, C. Rieke, (1992) 77–94.
G. Widman, C.E. Elger, Nonlinear EEG analysis in epilepsy: Its possible [23] J. Theiler, D. Prichard, Constrained-realization Monte-Carlo method for
use for interictal focus localization, seizure anticipation, and prevention, hypothesis testing, Physica D 94 (1996) 221–235.
J. Clin. Neurophysiol. 18 (2001) 209–222. [24] J. Theiler, P.E. Rapp, Re-examination of the evidence for low-
[8] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999. dimensional, nonlinear structure in the human brain electroencephalo-
[9] R.S. Patel, D. Van De Ville, F.D. Bowman, Determining significant gram, Electroencephalogr. Clin. Neurophysiol. 98 (1996) 213–222.
connectivity by 4D spatiotemporal wavelet packet resampling of [25] J. Timmer, Power of surrogate data testing with respect to nonstationarity,
functional neuroimaging data, NeuroImage 31 (2006) 1142–1155. Phys. Rev. E 58 (1998) 5153–5156.
[10] D.B. Percival, A.T. Walden, Wavelet Methods for Times Series Analysis, [26] R.A.M. Van der Linden and the SIDC team, Online catalogue of the
Cambridge University Press, 2000. sunspot index. http://www.sidc.be/sunspot-data/.