Chapter 4 - Correlation and Autocorrelation

9/21/2023
Chapter 4. Correlation and Autocorrelation
Nguyen VP Nguyen, Ph.D.

Department of Industrial & Systems Engineering, HCMUT
Email: nguyennvp@hcmut.edu.vn
Correlation coefficient
• The correlation coefficient (hệ số tương quan),

denoted as r, quantifies the degree and direction of
the linear relationship between two variables.
The correlation coefficient

ranges from -1 to 1.
-1: Perfect negative
linear relationship
0: No linear
relationship
1: Perfect positive
linear relationship
1
9/21/2023
Interpreting the strength of the correlation

• Interpreting the strength of the correlation:
-1.0 to -0.7: Strong negative correlation 0 to 0.3: Weak positive correlation
-0.7 to -0.3: Moderate negative correlation 0.3 to 0.7: Moderate positive correlation
-0.3 to 0: Weak negative correlation 0.7 to 1.0: Strong positive correlation
0: No correlation
• Note the following:
 Magnitude: The further away r is from 0, the stronger the
correlation. The direction (positive or negative) determines the
relationship direction, not the strength.
 No Causation Implication: A correlation, no matter how strong,
does not imply causation. Just because two variables move
together does not mean one caused the other.
 Type of Relationship: r measures the strength of a linear
relationship. Non-linear relationships might not be well-
represented by r.
3
Autocorrelation
• Def1: Autocorrelation (tự tương quan) is a correlation of the values
of a variable with values of the same variable lagged one or more
periods back.
• Consequences of autocorrelation include inaccurate estimates of
Time (t) Residual Plot
variances and inaccurate predictions.
• Violates the regression 15
10
assumption that 5
Residuals
residuals are random 0

and independent -5 0 2 4 6 8
-10
-15
• Other definitions: T ime (t)
 Def2: Autocorrelation as ‘lag correlation of Here, residuals show a
a given series within itself, lagged by a cyclic pattern, not random
number of times units’
 Def3: Serial correlation is the ‘lag
correlation between two different series’.
4
2
9/21/2023
Autocorrelation illustration
• Autocorrelation is the correlation between a variable
lagged one or more time periods and itself.
 Data patterns, including components such as trend and
seasonality, can be studied using autocorrelations
• t won’t changed on Yt, Y(t-
1) and Y(t-2)
• Variables Y(t-1) and Y(t-2)
are actually the Y values
that have been lagged by
one and two time periods
• The values for March (or
March sales) = 125 on Yt,
= 130 on Y(t-1) (of Yt
February sales) ; and
January sales, = 123 on
Y(t-2)
5
Autocorrelation coefficient
• Formula for computing the lag k autocorrelation
coefficient (rk) between observations of Yt and of that
are k periods apart Autocorrelation is a
n systematic pattern in the
 Y  Y Y
t t k Y  errors that can be either
attracting (positive) or
rk  t  k 1
n
k  0,1, 2, repelling (negative)
 Y  Y 
2
t autocorrelation.
t 1
where
rk = the autocorrelation coefficient for a lag of k periods
𝑌 = the mean of the values of the series
Yt = the observation in time period t
Yt-k = the observation k time periods earlier or at time period t - k
the number of time lags (k) increases, the magnitudes

of the autocorrelation coefficients decrease
6
3
9/21/2023
Autocorrelation Range and Strength
• Autocorrelation coefficients range from -1 to 1, similar

to regular correlation coefficients.
 1 suggests perfect positive autocorrelation,
 0 suggests no autocorrelation,
 -1 suggests perfect negative autocorrelation.
• Strength:
Close to 1 or -1: Strong positive or negative
autocorrelation, respectively. It shows the series has a
definite pattern that repeats over a specified lag.
Close to 0: Weak or no autocorrelation. The series values
don't show a clear pattern of repetition over the specific lag.
Interpreting the autocorrelation coefficient

• The autocorrelation is "good" or "bad" depends on the
specific application:
 Good:
͟ In forecasting, a significant autocorrelation might be
beneficial because it indicates that past values can help
predict future ones.
 Bad:
͟ In regression residuals, we don't want autocorrelation.
͟ If the residuals from a regression model have significant
autocorrelation, it means there's some pattern that the
model isn't capturing, which can lead to unreliable
coefficient estimates, standard errors, and tests of
significance.
8
4
9/21/2023
Interpreting the autocorrelation coefficient
• Significance of autocorrelation coefficient :

 It's essential to test if the autocorrelation is statistically
significant. Why?
 It's possible to have an autocorrelation coefficient that's not
0, but it's not statistically different from 0.
• Partial Autocorrelation:
 This is another metric used in time
series analysis.
 It measures the correlation
between a variable and its lag
while controlling for the values
at all shorter lags.
 It is helpful for identifying the true
order of an autoregressive
process (ARIMA model!) 9
Example: Computation of the Lag 1

Autocorrelation Coefficient
Harry Vernon has collected data on the number of VCRs sold last year
by Vernon’s Music Store.
• r1, the successive

monthly sales of
VCRs are somewhat
correlated with each
other
• The correlation
between Yt and the
autocorrelation for
time lag 2, is .463.
10
5
9/21/2023
Correlogram or Autocorrelation Function

• a plot of the autocorrelations
versus time lags
• The horizontal scale on the
bottom of the graph shows
each time lag of interest: 1,
2, 3
• The horizontal line in the
middle of the graph
represents autocorrelations
of zero.
• The vertical line that extends
upward above time lag 1
Autocorrelations shows AC of .57, lag 2 AC of
Lag ACF T LBQ .46, or .
1 0.571913 1.98 5.00
2 0.462687 1.25 8.59
• The dotted lines and the T
3 0.110583 0.27 8.82 (test) and LBQ (Ljung-Box
Q) statistics
11
Example: high/weak autocorrelation
• Scenario: Monthly average temperatures in a city with

a temperate climate. • a lag of 1 month (i.e.,
comparing each month
to the previous month),
there would be a high
positive
autocorrelation.
• This is because, for
most months, the
temperature either
steadily increases or
steadily decreases
from the previous
month, following a
seasonal pattern.
12
6
9/21/2023
Autocorrelation Model
Independent errors:
et does not depend on et-1 ( = 0)
Autocorrelated errors: The errors are not independent

et depends on et-1 ( 0)
Assumed Model: et =  et-1 + ut

where ut is assumed non-autocorrelated
Independent errors:
et does not depend on et-1 (r = 0)
Autocorrelated errors:
et depends on et-1 (r 0)
 The residuals will show a pattern over time

13
Autocorrelated Residuals
Common Rare
Positive Autocorrelation Negative Autocorrelation
20 20
15 15
10 10
Residual
5 5
Residual
0 0
-5 -5
-10 -10
-15 -15
-20 -20
Tim e Tim e
When a residual tends to be followed When a residual tends to be followed

by another of the same sign, we by another of opposite sign, we
have positive autocorrelation have negative autocorrelation
Look for cycles of of + + + + followed by - - - - Look for alternating + - + - pattern
14
7
9/21/2023
the observation of
the error term (ut) is
a function of the
previous (lagged)
observation of the
error term (ut-1)
15
Detect the presence of Autocorrelation

• Ljung-Box Q (LBQ) statistic:
 The Ljung-Box test is used to check whether any
group of autocorrelations of a time series are
different from zero. It is not limited to the detection
of just first-order autocorrelation.
 The test is based on a summation of squared
autocorrelations up to a certain lag k.
 The LBQ statistic tends to increase as more lags
with significant autocorrelations are included in the
summation.
• Durbin-Watson statistic:
 The Durbin-Watson statistic primarily tests for first-
order autocorrelation in the residuals of a time
series regression model.
 The test statistic d (the calculation shown in next few
slides)
16
8
9/21/2023
First-order autocorrelation
• First-order autocorrelation refers to the correlation between a
time series and its own values lagged by one period.
 If Yt represents the time series, first-order autocorrelation quantifies the
correlation between Yt and Yt-1
• Positive First-Order Autocorrelation
 If the time series has a positive first-order autocorrelation, it means that if
a particular value is above the mean, then the next value is also likely to
be above the mean. Similarly, if a value is below the mean, the next value
is likely to be below the mean.
 This often indicates a momentum or trend in the data.
• Negative First-Order Autocorrelation
 This suggests that if a value is above the mean, the next value is likely to
be below the mean, and vice versa.
 This can indicate a sort of "mean-reverting" oscillation.
• Near-Zero First-Order Autocorrelation
 This indicates that consecutive values in the time series are essentially
independent of each other
17
Autocorrelations for identifing time series data patterns

• To study data patterns (which include seasonality, trend),
using autocorrelations.
 The autocorrelation coefficients at different time lags (r1, r2, r3, …) are
used to identify time series data patterns.
• If a series is random,
 the autocorrelations between Yt and for any time lag k are close to zero.
 The successive values of a time series are not related to each other.
• If a series has a trend,
 successive observations are highly correlated, and the rk are
significantly different from zero for the first several time lags and then
gradually drop toward zero when the number of lags increases.
• If a series has a seasonal pattern,
 a significant autocorrelation coefficient will occur at the seasonal time
lag or multiples of the seasonal lag.
 The seasonal lag is 4 for quarterly data and 12 for monthly data.
18
9
9/21/2023
19
20
10
9/21/2023
21
Time Series Plot of Sales_1
400000
300000
Sales_1
200000
100000
0
1 8 16 24 32 40 48 56 64 72 80
Index
22
11
9/21/2023
Autocorrelation Function for Sales_1

(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 5 10 15 20 25 30 35 40 45 50 55 60
Lag
23
Testing for Autocorrelation
• The Durbin-Watson Statistic is used to test for

autocorrelation
H0: ρ = 0 (residuals are not correlated)

HA: ρ ≠ 0 (autocorrelation is present)
n
Durbin-Watson test statistic:
 (e t  e t 1 )2
d t 1
n
Assumed Model: et = r et-1 + ut
e
2
t
where ut is assumed non-autocorrelated t 1
24
12
9/21/2023
Testing for Positive Autocorrelation
H0: ρ = 0 (positive autocorrelation does not exist)

HA: ρ > 0 (positive autocorrelation is present)
 Calculate the Durbin-Watson test statistic = d
(The Durbin-Watson Statistic can be found using PHStat or Minitab)
 Find the values dL and dU from the Durbin-Watson table

(for sample size n and number of independent variables p)
Decision rule: reject H0 if d < dL

Reject H0 Inconclusive Do not reject H0
0 dL dU 2
25
Textbook page 484

Critical Points of the Durbin-Watson Statistic: =0.05,
n= Sample Size, k = Number of Independent Variables
26
13
9/21/2023
Textbook page 484
Critical Points of the Durbin-Watson Statistic: =0.05,

n= Sample Size, k = Number of Independent Variables
k=1 k=2 k=3 k=4 k=5
n dL dU dL dU dL dU dL dU dL dU
15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21
16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15
17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10
18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06
. . . . . .
. . . . . .
. . . . . .
65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77
70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77
75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.49 1.77
80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77
85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77
90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78
95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78
100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78
27
Durbin-Watson Test
Conclude that Conclude that Conclude that

positive Zone of autocorrelation is Zone of negative
autocorrelation indecision absent indecision autocorrelation
exists exists
0 dL dU 2 4-dU 4-dL 4
n=68, dL~1.57, dU=1.64
28
14
9/21/2023

• Example with n = 25: (continued)
160
140
Durbin-Watson Calculations 120
100
Sum of Squared
Sales
80 y = 30.65 + 4.7038x
Difference of Residuals 3296.18 2
60 R = 0.8976
Sum of Squared
40
Residuals 3279.98
20
Durbin-Watson 0
Statistic 1.00494 0 5 10 15 20 25 30
Tim e
Example 3.2  (e t  e t 1 )2
3296.18
d t 1
  1.00494
page 69
n
3279.98
e
2
t
t 1
29
• Here, n = 25 and there is one independent variable

• Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
• d = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
• Therefore the linear model is not the appropriate model to
forecast sales
Decision: reject H0 since
d = 1.00494 < dL
Reject H0 Inconclusive Do not reject H0

0 dL=1.29 dU=1.45 2
30
15

Chapter 4 - Correlation and Autocorrelation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 - Correlation and Autocorrelation

Uploaded by

Copyright:

Available Formats

9/21/2023

Chapter 4. Correlation and Autocorrelation

Nguyen VP Nguyen, Ph.D.

• The correlation coefficient (hệ số tương quan),

The correlation coefficient

Interpreting the strength of the correlation

residuals are random 0

the number of time lags (k) increases, the magnitudes

Autocorrelation Range and Strength

• Autocorrelation coefficients range from -1 to 1, similar

Interpreting the autocorrelation coefficient

Interpreting the autocorrelation coefficient

• Significance of autocorrelation coefficient :

Example: Computation of the Lag 1

• r1, the successive

Correlogram or Autocorrelation Function

Example: high/weak autocorrelation

• Scenario: Monthly average temperatures in a city with

Autocorrelated errors: The errors are not independent

Assumed Model: et =  et-1 + ut

 The residuals will show a pattern over time

When a residual tends to be followed When a residual tends to be followed

Look for cycles of of + + + + followed by - - - - Look for alternating + - + - pattern

Detect the presence of Autocorrelation

Autocorrelations for identifing time series data patterns

Time Series Plot of Sales_1

Autocorrelation Function for Sales_1

Testing for Autocorrelation

• The Durbin-Watson Statistic is used to test for

H0: ρ = 0 (residuals are not correlated)

Testing for Positive Autocorrelation

H0: ρ = 0 (positive autocorrelation does not exist)

 Find the values dL and dU from the Durbin-Watson table

Decision rule: reject H0 if d < dL

Textbook page 484

Textbook page 484

Critical Points of the Durbin-Watson Statistic: =0.05,

Conclude that Conclude that Conclude that

n=68, dL~1.57, dU=1.64

Testing for Positive Autocorrelation

Durbin-Watson Calculations 120

Testing for Positive Autocorrelation

• Here, n = 25 and there is one independent variable

Reject H0 Inconclusive Do not reject H0

You might also like