Professional Documents
Culture Documents
Additional Notes 2 - Correlation and Stationary Time Series
Additional Notes 2 - Correlation and Stationary Time Series
MEASURING DEPENDENCE
𝜇𝑋𝑡 = 𝐸(𝑋𝑡 )
provided it exists, where 𝐸 denotes the usual expected value operator. When no confusion exists
about which time series we are referring to, we will drop a subscript and write 𝜇𝑋𝑡 as 𝜇𝑡 .
A simple kind of series which is a collection of uncorrelated random variables, 𝑊𝑡 , with mean 0 and
2
finite variance 𝜎𝑊 . We denote the series as
2
{𝑊𝑡 } ~ 𝑤𝑛(0, 𝜎𝑊 )
A special version of white noise is when the variables are independent and identically-distributed
normals, written as
iid 2
𝑊𝑡 Normal(0, 𝜎𝑊 )
~
𝑋𝑡 = 𝛿 + 𝑋𝑡−1 + 𝑊𝑡 = 𝛿 ⋅ 𝑡 + ∑ 𝑊𝑗
𝑗=1
3.) Signal Plus Noise
𝑡 + 15
𝑋𝑡 = 2 𝑐𝑜𝑠 (2𝜋 ) + 𝑊𝑡
50
(Definition) The autocovariance function is defined as the second moment product
for all 𝑠 and 𝑡. When no possible confusion exists about which time series we are referring to, we
will drop the subscript and write 𝛾𝑋 (𝑠, 𝑡) as 𝛾(𝑠, 𝑡).
Note that 𝛾𝑋 (𝑠, 𝑡) = 𝛾𝑋 (𝑡, 𝑠) for all time points 𝑠 and 𝑡. The autocovariance measures the
linear dependence between two points on the same series observed at different times.
Recall from classical statistics that if 𝛾𝑋 (𝑠, 𝑡) = 0, then 𝑋𝑠 and 𝑋𝑡 are not linearly related, but
there still may be some dependence structure between them. If, however, 𝑋𝑠 and 𝑋𝑡 are
Bivariate Normal, 𝜸𝑿 (𝒔, 𝒕) = 𝟎 ensures their independence.
It is clear that, for 𝑠 = 𝑡, the autocovariance reduces to the (assumed finite) variance.
𝑋𝑡 = ∑ 𝑊𝑗
𝑗=1
𝑈 = ∑ 𝑎𝑗 𝑋𝑗 and 𝑉 = ∑ 𝑏𝑘 𝑌𝑘
𝑗=1 𝑘=1
are linear filters of (finite variance) random variables 𝑋𝑗 and 𝑌𝑘 , respectively, then
𝑚 𝑟
𝐶𝑜𝑣(𝑈, 𝑉) = ∑ ∑ 𝑎𝑗 𝑏𝑘 ⋅ 𝐶𝑜𝑣(𝑋𝑗 , 𝑌𝑘 )
𝑗=1 𝑘=1
Furthermore,
𝑉𝑎𝑟(𝑈) = 𝐶𝑜𝑣(𝑈, 𝑈)
(Definition) The autocorrelation function (ACF) is defined as
𝛾(𝑠, 𝑡)
𝜌(𝑠, 𝑡) =
√𝛾(𝑠, 𝑠) ⋅ 𝛾(𝑡, 𝑡)
The ACF measures the linear predictability of the series at time 𝑡, say 𝑋𝑡 , using only the value
𝑋𝑠 . And because it is a correlation, we must have −𝟏 ≤ 𝝆(𝒔, 𝒕) ≤ 𝟏.
𝛾𝑋𝑌 (𝑠, 𝑡)
𝜌𝑋𝑌 (𝑠, 𝑡) =
√𝛾𝑋 (𝑠, 𝑠) ⋅ 𝛾𝑌 (𝑡, 𝑡)
STATIONARITY
Stationarity requires regularity in the mean and autocorrelation functions so that these
quantities (at least) may be estimated by averaging.
Because the mean function, 𝐸(𝑋𝑡 ) = 𝜇𝑡 of a stationary time series is independent of time 𝑡, we
will write 𝜇𝑡 = 𝜇.
Also, because the autocovariance function, 𝛾(𝑠, 𝑡), of a stationary time series, 𝑋𝑡 , depends on 𝑠
and 𝑡 only through time difference, we may simplify the notation.
𝛾(ℎ)
𝜌(ℎ) =
𝛾(0)
10.) Show that a time series can have stationary behavior around a trend, for instance with
𝑋𝑡 = 𝛽𝑡 + 𝑌𝑡
where 𝑌𝑡 is stationary with mean and autocovariance functions 𝜇𝑌 and 𝛾𝑌 (ℎ), respectively. This
behavior is sometimes called trend stationarity.
11.) Show that an Autoregressive (AR) Model, in this example consider AR(1),
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝑊𝑡
is stationary. Identify conditions as well that will make the series stationary.
WOLD DECOMPOSITION
Any stationary time series, 𝑋𝑡 , can be written as linear combination (filter) of white noise terms; that is,
∞
𝑋𝑡 = 𝜇 + ∑ 𝜓𝑗 𝑊𝑡−𝑗
𝑗=0
Remarks:
• Stationary time series can be thought of as filters of white noise. It may not always be the best
model, but models of this form are viable in many situations.
• Any stationary time series can be represented as a model that does not depend on the future.
• Because the coefficients satisfy 𝜓𝑗2 → 0 as 𝑗 → ∞, the dependence on the distant past is
negligible.
For the linear process, we may show that the mean function is 𝑬(𝑿𝒕 ) = 𝝁, and the
autocovariance function is given by
∞
(Definition) Two time series, say, 𝑋𝑡 and 𝑌𝑡 , are jointly stationary if they are each stationary,
and the cross-covariance function
(Definition) The cross-correlation function (CCF) of jointly stationary time series 𝑿𝒕 and 𝒀𝒕 ,
is defined as
𝛾𝑋𝑌 (ℎ)
𝜌𝑋𝑌 (ℎ) =
√𝛾𝑋 (0) ⋅ 𝛾𝑌 (0)
As usual, −𝟏 ≤ 𝝆𝑿𝒀 (𝒉) ≤ 𝟏 and it is not generally symmetric about zero because ℎ > 0,
𝑌𝑡 happens before 𝑋𝑡+ℎ whereas when ℎ < 0, 𝑌𝑡 happens after 𝑋𝑡+ℎ .
11.) Consider the two series formed from the sum and difference of two successive values of a
white noise process, say,
𝑋𝑡 = 𝑊𝑡 + 𝑊𝑡−1
𝑌𝑡 = 𝑊𝑡 − 𝑊𝑡−1
12.) Consider the problem of determining leading or lagging relations between two stationary
series 𝑋𝑡 and 𝑌𝑡 . If for some unknown integer 𝑙, the model
𝑌𝑡 = 𝐴𝑋𝑡−𝑙 + 𝑊𝑡
holds, the series 𝑋𝑡 is said to lead 𝑌𝑡 for 𝑙 > 0 and is said to lag 𝑌𝑡 for 𝑙 < 0. Assuming that the
noise 𝑊𝑡 is uncorrelated with the 𝑋𝑡 series, explore the properties of the the cross-covariance
function between 𝑋𝑡 and 𝑌𝑡 .
ESTIMATION OF CORRELATION
Sample Mean
𝑛
1
𝑥̅ = ∑ 𝑥𝑡
𝑛
𝑡=1
Sample Variance
𝑛
1 |ℎ|
𝑉𝑎𝑟(𝑥̅ ) = ∑ (1 − ) 𝛾𝑥 (ℎ)
𝑛 𝑛
ℎ=−𝑛
If the process is white noise, the variance reduces to the familiar 𝜎𝑋2 /𝑛.
NOTE: In the case of dependence, the standard error of 𝑥̅ may be smaller or larger than the
white noise case depending on the nature of the correlation structure.
𝛾̂(ℎ) ∑𝑛−ℎ
𝑡=1 (𝑥𝑡+ℎ − 𝑥̅ )(𝑥𝑡 − 𝑥̅ )
𝜌̂(ℎ) = =
𝛾̂(0) ∑𝑛𝑡=1(𝑥𝑡 − 𝑥̅ )2
for ℎ = 0,1, … , 𝑛 − 1.
Remark:
It is important to note that this approach to estimating correlation makes sense only if the data
are stationary. If the data were not stationary, each point in the graph could be an observation
from a different correlation structure.
If 𝑋𝑡 white noise, then for 𝑛 large and under mild conditions, the sample ACF, 𝜌̂𝑋 (ℎ), for ℎ =
1,2, … , 𝐻, where 𝐻 is fixed but arbitrary, is approximately normal with zero mean and standard
deviation given by 1/√𝑛.
(Definition) The estimators for the cross-covariance function 𝛾𝑋𝑌 (ℎ) and the cross-correlation
𝜌𝑋𝑌 (ℎ) are given, respectively by the sample cross-variance function
𝑛−ℎ
where 𝛾̂𝑋𝑌 (−ℎ) = 𝛾̂𝑌𝑋 (ℎ) determines the function for negative lags, and the sample cross-
correlation function
𝛾̂𝑋𝑌 (ℎ)
𝜌̂𝑋𝑌 (ℎ) =
√𝛾̂𝑋 (0) ⋅ 𝛾̂𝑌 (0)
If 𝑋𝑡 and 𝑌𝑡 are independent processes, then under mild conditions, the large sample
distribution of 𝜌̂𝑋𝑌 (ℎ) is normal with mean zero and standard deviation 1/√𝑛 if at least one of
the processes is independent white noise.
ADDITIONAL e-VIDEO RESOURCES:
HANDS-ON EXERCISE
Toss a fair coin. For every toss 𝑡, let 𝑥𝑡 = 2 when a head is obtained and 𝑥𝑡 = −2 when a
tail is observed. Then, generate the stochastic process
{𝑦𝑡 : 𝑦𝑡 = 5 + 𝑥𝑡 − 0.5𝑥𝑡−1 ; 𝑡 = 1, 2, … , 𝑛}
Compute for empirical (estimated) ACF of a generated series for 𝑛 = 10 and 𝑛 = 100.
Compare this, with respect to convergence, with the theoretical ACF of the process.