2 Exploratory Data Analysis

2 Time Series Regression and

2.2 Exploratory Data Analysis
Exploratory Data Analysis
Aaron Smith
This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway, David S. Stoffer (

The most recent version of the package can be found at


You can find demonstrations of astsa capabilities at

In addition, the News and ChangeLog files are at


The webpages for the texts and some help on using R for time series analysis can be found at (

UCF students can download it for free through the library.

Punchline of this video:

if we have a trend stationary time series, we use detrending to get the stationary component
if we have a random walk time series, we use differencing to get a stationary time series

Our time series needs to be stationary for averaging the values over time to make sense.

We use sample autocorrelation to measure (estimate) the dependence of values between each other.

When we use autocorrelation, we are assuming that the dependence between values is constant over the time

stationarity in mean
stationarity in autocorrelation

Often, this is not the case.

The Johnson & Johnson series has a mean that increases exponentially over time, and the increase in the
magnitude of the fluctuations around this trend causes changes in the covariance function; the variance of the
process, for example, clearly increases as one progresses over the length of the series.

Johnson and Johnson Quarterly Earnings Per Share

Johnson and Johnson quarterly earnings per share, 84 quarters (21 years) measured from the first quarter of
1960 to the last quarter of 1980.

Note the gradually increasing underlying trend and the rather regular variation superimposed on the trend that
seems to repeat over quarters.

list = "jj",
package = "astsa"
x = jj,
col = 4,
ylab = "Quarterly Earnings per Share"

The global temperature series shown contains some evidence of a trend over time.

list = "globtemp",
package = "astsa"
x = globtemp,
col = 4,
type = "o",
ylab = "Global Temperature Deviations"

Trend stationary
Trend stationary model is the easiest form of nonstationarity to work with. It has stationary behavior around a

x t = μt + yt

μt is the trend

yt is a stationary process

Frequently we will estimate the trend, then find the stationary process by working with the residuals

ŷ t = x t − μ̂t

Example 2.4 Detrending Chicken Prices

Let’s use a trend stationary model

x t = μt + yt

μt = β 0 + β 1 t

load the data

list = "chicken",
package = "astsa"

formula = chicken ~ time(chicken)

## Call:
## lm(formula = chicken ~ time(chicken))
## Coefficients:
## (Intercept) time(chicken)
## -7131.022 3.592

plot the time series

x = chicken,
main = "original time series"

μ̂t = −7131 + 3.59t

ŷ t = x t + 7131 − 3.59t

x = time(chicken),
y = chicken - predict(
object = lm(
formula = chicken ~ time(chicken)
type = "l"

plot the detrended time series

# astsa now has a detrend script, so Figure 2.4 can be done as

x = astsa::detrend(
series = chicken
main = "detrended"

plot the difference between observations as a time series

x = diff(
x = chicken
main = "first difference"

random walk with drift model,

μt = δ + μt−1 + wt

δ the drift

wt white noise

If x t is trend stationary, and the trend is a random walk with drift

x t − x t−1 =(μt + yt ) − (μt−1 + yt−1 )

=(μt − μt−1 ) + (yt − yt−1 )

=(δ + wt ) + (yt − yt−1 )

Since δ is constant, E(wt ) = 0 , and yt is stationary, the difference of consecutive observations has constant
expected value.

Let z_t =& y_{t} - y_{t-1} , then

γ z (h) =cov(zt+h , zt )

=cov(yt+h − yt+h−1 , yt − yt−1 )

=cov(yt+h , yt ) + cov(yt+h , yt−1 ) + cov(yt+h−1 , yt ) + cov(yt+h−1 , yt−1 )

=γ y (h) + γ y (h + 1) + γ y (h − 1) + γ y (h)

=γ y (h + 1) + 2γ y (h) + γ y (h − 1)

this is independent of time

An advantage of differencing is that no parameter is estimated.

A disadvantage of differencing is that it does not provide an estimate of the stationary component yt .

Use differencing when you want a stationary time series from a non-stationary time series.

Use detrending if you want to estimate a stationary component yt .

If x t = μt + yt and μt = β0 + β1 t , then

x t − x t−1 =(μt + yt ) − (μt−1 + yt−1 )

=(β 0 + β 1 t + yt ) − (β 0 + β 1 (t − 1) + yt−1 )

=β 1 + yt − yt−1

differencing notation
▽x t = x t − x t−1

We use first differences to estimate a linear trend.

We use second differences to estimate a quadratic trend.

the backshift operator

Bx t = x t−1

B x t = B(Bx t ) = B(x t−1 ) = x t−2

B x t = x t−k

−1 −1 −1
B Bx t = x t = BB x t (B is the forward shift operator)

B xt = xt

▽x t = ( B − B)x t

2 0 2 0 2
▽ x t = (B − B) x t = ( B − 2B + B )x t = x t − 2x t−1 + x t−2

▽ x t = ▽(x t − x t−1 ) = (x t − x t−1 ) − (x t − x t−2 ) = x t − 2x t−1 + x t−2

Definition 2.5 Differences of order d

d 0 d
▽ = (B − B)

The first difference is a a linear filter applied to eliminate a trend.

Other filters, formed by averaging values near x t , can produce adjusted series that eliminate other kinds of
unwanted fluctuations.

The differencing technique is an important component of the ARIMA model of Box and Jenkins.

Example 2.5 Differencing Chicken Prices

The first difference of the chicken prices series produces different results than removing trend by detrending via

The differenced series does not contain the long (five-year) cycle we observe in the detrended series.

The differenced series exhibits an annual cycle that was obscured in the original or detrended data.

plot the autocorrelation of the time series, detrended time series, and the differences

# and Figure 2.5 as
#par(mfrow=c(3,1)) # plot ACFs
series = chicken,
max.lag = 48,
main = "chicken"

## [1] 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.86 0.84 0.82 0.80 0.78 0.75 0.73 0.71
## [16] 0.68 0.66 0.63 0.61 0.59 0.57 0.55 0.53 0.50 0.48 0.46 0.44 0.43 0.41 0.40
## [31] 0.38 0.37 0.37 0.36 0.35 0.34 0.33 0.31 0.30 0.28 0.27 0.26 0.25 0.24 0.23
## [46] 0.22 0.21 0.20

series = astsa::detrend(
series = chicken
max.lag = 48,
main = "detrended"

## [1] 0.97 0.91 0.83 0.75 0.68 0.61 0.56 0.51 0.48 0.46 0.43 0.39
## [13] 0.33 0.26 0.20 0.14 0.08 0.03 0.00 -0.03 -0.04 -0.05 -0.07 -0.10
## [25] -0.13 -0.18 -0.21 -0.24 -0.25 -0.25 -0.23 -0.20 -0.16 -0.13 -0.11 -0.10
## [37] -0.11 -0.13 -0.14 -0.16 -0.17 -0.16 -0.15 -0.13 -0.10 -0.08 -0.05 -0.04

series = diff(
x = chicken
max.lag = 48,
main = "first difference"

## [1] 0.72 0.39 0.09 -0.07 -0.16 -0.20 -0.27 -0.23 -0.11 0.09 0.26 0.33
## [13] 0.20 0.07 -0.03 -0.10 -0.19 -0.25 -0.29 -0.20 -0.08 0.08 0.16 0.18
## [25] 0.08 -0.06 -0.21 -0.31 -0.40 -0.40 -0.33 -0.18 0.02 0.20 0.30 0.35
## [37] 0.26 0.13 -0.02 -0.14 -0.23 -0.21 -0.18 -0.11 -0.03 0.08 0.21 0.33

Example 2.6 Differencing Global Temperature

The global temperature series appears to behave more as a random walk than a trend stationary series.

Rather than detrend the data, it would be more appropriate to use differencing to coerce it into stationarity.

In this case it appears that the differenced process shows minimal autocorrelation, which may imply the global
temperature series is nearly a random walk with drift.

It is interesting to note that if the series is a random walk with drift, the mean of the differenced series, which is
an estimate of the drift, is about .008, or an increase of about one degree centigrade per 100 years.

load the data

list = c("globtemp","gtemp"),
package = "astsa"

plot the time series

x = globtemp

x = gtemp

x = diff(
x = globtemp
type = "o"

x = diff(
x = gtemp
type = "o"

x = diff(
x = globtemp
) # drift estimate = .008

## [1] 0.007925926

x = diff(
x = gtemp
) # drift estimate = .0066

## [1] 0.006589147

autocorrelation of the differences

series = diff(
x = globtemp
max.lag = 48,
main = ""

## [1] -0.24 -0.19 -0.08 0.20 -0.15 -0.03 0.03 0.14 -0.16 0.11 -0.05 0.00
## [13] -0.13 0.14 -0.01 -0.08 0.00 0.19 -0.07 0.02 -0.02 0.08 -0.12 -0.07
## [25] 0.10 0.13 -0.15 -0.01 0.09 0.00 -0.09 0.07 -0.03 -0.13 0.06 -0.06
## [37] 0.09 0.01 0.09 -0.06 -0.12 0.00 0.13 -0.03 0.00 0.01 0.10 -0.06

series = diff(
x = gtemp
max.lag = 48,
main = ""

## [1] -0.29 -0.16 -0.12 0.22 -0.15 0.02 0.03 0.11 -0.20 0.15 0.04 -0.07
## [13] -0.17 0.15 0.06 -0.08 0.00 0.14 -0.14 0.04 0.00 0.11 -0.13 -0.03
## [25] 0.08 0.10 -0.23 0.07 0.07 -0.01 -0.11 0.15 -0.05 -0.10 0.02 -0.03
## [37] 0.06 0.00 0.07 -0.05 -0.12 0.04 0.13 -0.03 -0.04 -0.01 0.11 -0.09

frequently, log-transformations of time series will equalize the variability over a length of time. Especially if
larger fluctuations tend to appear with larger observed values.

yt = log(x t )

Box-Cox transformation
Frequently we use the Box-Cox transformation to get a variable that looks more similar to normally distributed,
or to improve a variable as an input for another time series.

⎧ x − 1
if λ ≠ 0
yt = ⎨

log(x t ) if λ = 0

Example 2.7 Paleoclimatic Glacial Varves

Melting glaciers deposit yearly layers of sand and silt during the spring melting seasons, which can be
reconstructed yearly over a period ranging from the time deglaciation began in New England (about 12,600
years ago) to the time it ended (about 6,000 years ago). Such sedimentary deposits, called varves, can be

used as proxies for paleoclimatic parameters, such as temperature, because, in a warm year, more sand and
silt are deposited from the receding glacier.

The plot shows the thicknesses of the yearly varves collected from one location in Massachusetts for 634
years, beginning 11,834 years ago. For further information.

list = "varve",
package = "astsa"

time series plot of the time series

#layout(matrix(1:4,2), widths=c(2.5,1))
x = varve,
main = "",
ylab = "",
col = 4
text = "varve",
side = 3,
line = 0.5,
cex = 1.2,
font = 2,
adj = 0

Because the variation in thicknesses increases in proportion to the amount deposited, a logarithmic
transformation could remove the nonstationarity observable in the variance as a function of time. It is clear that
this improvement has occurred.

time series of the log-transform of the time series

x = log(varve),
main = "",
ylab = "",
col = 4
text = "log(varve)",
side = 3,
line = 0.5,
cex = 1.2,
font = 2,
adj = 0

We may also plot the histogram of the original and transformed data to argue that the approximation to
normality is improved. The ordinary first differences. We note that the first differences have a

normal plots of the time series and the log-transformed time series

x = varve

y = varve,
main = "",
col = 4
y = varve,
col = 2,
lwd = 2

x = log(varve)

y = log(varve),
main = "",
col = 4
y = log(varve),
col = 2,
lwd = 2

Scatterplot matrices for lagged data

We use scatterplot matrices to visualize the relationship between and time series and its lags.

The autocorrelation function tells us whether a substantial linear relation exists between the series and its own
lagged values. The ACF gives a profile of the linear correlation at all possible lags and shows which values of h
lead to the best predictability.

The restriction of this idea to linear predictability, which ignores non-linear relationships between a time series
and its lags.

Example 2.8 Scatterplot Matrices, SOI and

To check for nonlinear relations of this form, it is convenient to display a lagged scatterplot matrix.

The sample autocorrelations are displayed in the upper right-hand corner and superimposed on the
scatterplots are locally weighted scatterplot smoothing (lowess) lines that can be used to help discover any

load the data

list = c("soi","rec"),
package = "astsa"

We notice that lags 1, 12, 2, and 11 have the strongest correlations. SOI is over months so -12 corresponds to
the same month in the previous year.

lag plot on soi

series = soi,
max.lag = 12,
col = astsa::astsa.col(
col = 4,
alpha = 0.3
cex = 1.5,
pch = 20

In a previous video we established a relationship between SOI and the recruitment time series.

We see that there is a relationship between recruitment and SOI lagged by 5, 6, 7, 8.

The negative correlation signs indicate that increases (decreases) in SOI lead to decreases (increases) in

The curvative in the LOESS lines leads us to conjecture that different signs of SOI have different impacts on

lag plot of soi leading rec

series1 = soi,
series2 = rec,
max.lag = 8,
col = astsa::astsa.col(
col = 4,
alpha = 0.3
cex = 1.5,
pch = 20

Example 2.9 Regression with Lagged

R t = β 0 + β 1 St−6 + wt

Let’s expand this model with a dummy variable to incorporate the positive/negative findings for SOI

R t =β 0 + β 1 St−6 + β 2 Dt−6 + β 3 Dt−6 St−6 + wt

0 if St < 0
D t ={
1 if St ≥ 0

β 0 + β 1 St−6 + wt if St < 0
R t ={ 6

(β 0 + β 2 ) + (β 1 + β 3 )St−6 + wt if St ≥ 0

dummy = ifelse(
test = soi < 0,
yes = 0,
no = 1
fish = ts.intersect(
rec = rec,
soiL6 = lag(
x = soi,
k = -6
dL6 = lag(
x = dummy,
k = -6
dframe = TRUE
lm_fish <- lm(
formula = rec~ soiL6*dL6,
data = fish,
na.action = NULL
object = lm_fish

## Call:
## lm(formula = rec ~ soiL6 * dL6, data = fish, na.action = NULL)
## Residuals:
## Min 1Q Median 3Q Max
## -63.291 -15.821 2.224 15.791 61.788
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.479 2.865 25.998 < 2e-16 ***
## soiL6 -15.358 7.401 -2.075 0.0386 *
## dL6 -1.139 3.711 -0.307 0.7590
## soiL6:dL6 -51.244 9.523 -5.381 1.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 21.84 on 443 degrees of freedom
## Multiple R-squared: 0.4024, Adjusted R-squared: 0.3984
## F-statistic: 99.43 on 3 and 443 DF, p-value: < 2.2e-16

x = fish$soiL6,
y = fish$rec,
type = 'p',
col = 4,
ylab = 'rec',
xlab = 'soiL6'
x = lowess(
x = fish$soiL6,
y = fish$rec
col = 4,
lwd = 2
x = fish$soiL6,
y = fitted(
object = lm_fish
pch = '+',
col = 6

time series plot of the residuals, there is autocorrelation in the residuals

x = resid(
object = lm_fish
) # not shown ...

series = resid(
object = lm_fish
) # ... but obviously not noise

## [1] 0.69 0.62 0.49 0.37 0.24 0.15 0.08 0.00 -0.03 -0.10 -0.13 -0.16
## [13] -0.17 -0.23 -0.24 -0.23 -0.23 -0.22 -0.17 -0.09 -0.05 0.01 0.05 0.06
## [25] 0.09 0.07 0.10 0.06 0.02 -0.02 -0.02 -0.02 -0.03 -0.02 0.00 0.01
## [37] -0.01 -0.04 -0.07 -0.05 -0.06 -0.03 -0.02 0.01 0.04 0.04 0.08 0.08

Example 2.10 Using Regression to Discover

a Signal in Noise
Frequently we can statistically capture periodic behavior without knowing the mathematical function of the

The trigonometric identities and the orthogonality of Fourier series enables regression to estimate periodic

cos(α + β) = cos(α)cos(β) − sin(α)sin(β)

3π 3π 3π
cos (2πx + ) = cos (2πx) cos ( ) − sin (2πx) sin ( )
5 5 5

3π 3π 3π
2cos (2πx + ) = 2cos ( ) cos (2πx) − 2sin ( ) sin (2πx)
5 5 5

2cos (2πx + ) ≈ −0.618034cos (2πx) − −1.902113sin (2πx)

true coefficients: − 0.618034, −1.902113

seed = 823
) # so you can reproduce these results
x = 2*cos(x = 2*pi*(1:500)/50 + 0.6*pi) + rnorm(n = 500,mean = 0,sd = 5)
z1 = cos(
x = 2*pi*(1:500)/50
z2 = sin(
x = 2*pi*(1:500)/50
M_trig <- data.frame(
x = x,
z1 = z1,
z2 = z2
lm_trig <- lm(
formula = x ~ 0 + z1 + z2,
data = M_trig
object = lm_trig
) # zero to exclude the intercept

## Call:
## lm(formula = x ~ 0 + z1 + z2, data = M_trig)
## Residuals:
## Min 1Q Median 3Q Max
## -14.1836 -2.9692 -0.0714 3.4311 14.0427
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z1 -0.6126 0.2986 -2.052 0.0407 *
## z2 -1.6664 0.2986 -5.581 3.94e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 4.721 on 498 degrees of freedom
## Multiple R-squared: 0.06629, Adjusted R-squared: 0.06254
## F-statistic: 17.68 on 2 and 498 DF, p-value: 3.828e-08

x = x,
col = 4

x = x,
col = astsa::astsa.col(
col = 4,
alpha = 0.7
ylab = expression(hat(x))
x = fitted(
object = lm_trig
col = 2,
lwd = 2

increase the sample size to show convergence of the coefficients

seed = 823
) # so you can reproduce these results
x = 2*cos(x = 2*pi*(1:(1e6))/(1e5) + 0.6*pi) + rnorm(n = (1e6),mean = 0,sd = 5)
z1 = cos(
x = 2*pi*(1:(1e6))/(1e5)
z2 = sin(
x = 2*pi*(1:(1e6))/(1e5)
M_trig <- data.frame(
x = x,
z1 = z1,
z2 = z2
lm_trig <- lm(
formula = x ~ 0 + z1 + z2,
data = M_trig
object = lm_trig
) # zero to exclude the intercept

## Call:
## lm(formula = x ~ 0 + z1 + z2, data = M_trig)
## Residuals:
## Min 1Q Median 3Q Max
## -23.8745 -3.3805 -0.0108 3.3583 25.4331
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z1 -0.617281 0.007065 -87.38 <2e-16 ***
## z2 -1.908289 0.007065 -270.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 4.995 on 999998 degrees of freedom
## Multiple R-squared: 0.07459, Adjusted R-squared: 0.07458
## F-statistic: 4.03e+04 on 2 and 999998 DF, p-value: < 2.2e-16

Write estimated model as one trigonometric function. In general we write a sine/cosine wave using sine, but
since the original function is written as a cosine we will use cosine. (Sine and cosine are shifts of each other.)
Since the author used 2π in every trigonometric function, we take the point of view that the period/frequency is
known. Amplitude and phase shift are unknown. Since there is no intercept, there is no phase shift.

ˆ ˆ
−0.6172813cos(2πt) − 1.9082887sin(2πt) =A cos(2πt + θ )

ˆ ˆ ˆ ˆ
=A cos(2πt)cos(θ ) − A sin(2πt)sin(θ )

ˆ ˆ
A cos(θ ) = − 0.6172813

ˆ ˆ
−A sin(θ ) = − 1.9082887

2 2 2
ˆ 2 ˆ ˆ 2 ˆ ˆ 2 2
A cos (θ ) + A sin (θ ) =A = (−0.6172813) + (−1.9082887) = 4.022602

|A | =2.005643

cos(θ ) = − 0.3077723

sin(θ ) =0.9514600

ˆ −1
θ =cos (−0.3077723) = 1.883647

ˆ ˆ
A cos(2πt + θ ) =2.005643cos(2πt + 1.883647)

Acos(2πt + θ) =2cos (2πt + π)

π ≈1.884956

t0 <- seq(
from = 0,
to = 1,
length = 10000
x_correct <- 2*cos(2*pi*t0 + 3*pi/5)
x_estimated <- 2.005643*cos(2*pi*t0 + 1.883647)
M <- data.frame(
t0 = t0,
correct_model = x_correct,
estimated_model = x_estimated
M <- tidyr::gather(
data = M,
key = "model",
value = "x",
ggplot(M) +
aes(x = t0,y = x,group = model,color = model) +
geom_line() +

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 34/34

