Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

9/21/2023

Chapter 4. Correlation and Autocorrelation

Nguyen VP Nguyen, Ph.D.


Department of Industrial & Systems Engineering, HCMUT
Email: nguyennvp@hcmut.edu.vn

Correlation coefficient

• The correlation coefficient (hệ số tương quan),


denoted as r, quantifies the degree and direction of
the linear relationship between two variables.

The correlation coefficient


ranges from -1 to 1.
-1: Perfect negative
linear relationship
0: No linear
relationship
1: Perfect positive
linear relationship

1
9/21/2023

Interpreting the strength of the correlation


• Interpreting the strength of the correlation:
-1.0 to -0.7: Strong negative correlation 0 to 0.3: Weak positive correlation
-0.7 to -0.3: Moderate negative correlation 0.3 to 0.7: Moderate positive correlation
-0.3 to 0: Weak negative correlation 0.7 to 1.0: Strong positive correlation
0: No correlation
• Note the following:
 Magnitude: The further away r is from 0, the stronger the
correlation. The direction (positive or negative) determines the
relationship direction, not the strength.
 No Causation Implication: A correlation, no matter how strong,
does not imply causation. Just because two variables move
together does not mean one caused the other.
 Type of Relationship: r measures the strength of a linear
relationship. Non-linear relationships might not be well-
represented by r.
3

Autocorrelation
• Def1: Autocorrelation (tự tương quan) is a correlation of the values
of a variable with values of the same variable lagged one or more
periods back.
• Consequences of autocorrelation include inaccurate estimates of
Time (t) Residual Plot
variances and inaccurate predictions.
• Violates the regression 15
10
assumption that 5
Residuals

residuals are random 0


and independent -5 0 2 4 6 8
-10
-15
• Other definitions: T ime (t)
 Def2: Autocorrelation as ‘lag correlation of Here, residuals show a
a given series within itself, lagged by a cyclic pattern, not random
number of times units’
 Def3: Serial correlation is the ‘lag
correlation between two different series’.
4

2
9/21/2023

Autocorrelation illustration
• Autocorrelation is the correlation between a variable
lagged one or more time periods and itself.
 Data patterns, including components such as trend and
seasonality, can be studied using autocorrelations
• t won’t changed on Yt, Y(t-
1) and Y(t-2)
• Variables Y(t-1) and Y(t-2)
are actually the Y values
that have been lagged by
one and two time periods
• The values for March (or
March sales) = 125 on Yt,
= 130 on Y(t-1) (of Yt
February sales) ; and
January sales, = 123 on
Y(t-2)
5

Autocorrelation coefficient
• Formula for computing the lag k autocorrelation
coefficient (rk) between observations of Yt and of that
are k periods apart Autocorrelation is a
n systematic pattern in the
 Y  Y Y
t t k Y  errors that can be either
attracting (positive) or
rk  t  k 1
n
k  0,1, 2, repelling (negative)
 Y  Y 
2
t autocorrelation.
t 1
where
rk = the autocorrelation coefficient for a lag of k periods
𝑌 = the mean of the values of the series
Yt = the observation in time period t
Yt-k = the observation k time periods earlier or at time period t - k

the number of time lags (k) increases, the magnitudes


of the autocorrelation coefficients decrease
6

3
9/21/2023

Autocorrelation Range and Strength

• Autocorrelation coefficients range from -1 to 1, similar


to regular correlation coefficients.
 1 suggests perfect positive autocorrelation,
 0 suggests no autocorrelation,
 -1 suggests perfect negative autocorrelation.
• Strength:
Close to 1 or -1: Strong positive or negative
autocorrelation, respectively. It shows the series has a
definite pattern that repeats over a specified lag.
Close to 0: Weak or no autocorrelation. The series values
don't show a clear pattern of repetition over the specific lag.

Interpreting the autocorrelation coefficient


• The autocorrelation is "good" or "bad" depends on the
specific application:
 Good:
͟ In forecasting, a significant autocorrelation might be
beneficial because it indicates that past values can help
predict future ones.
 Bad:
͟ In regression residuals, we don't want autocorrelation.
͟ If the residuals from a regression model have significant
autocorrelation, it means there's some pattern that the
model isn't capturing, which can lead to unreliable
coefficient estimates, standard errors, and tests of
significance.
8

4
9/21/2023

Interpreting the autocorrelation coefficient

• Significance of autocorrelation coefficient :


 It's essential to test if the autocorrelation is statistically
significant. Why?
 It's possible to have an autocorrelation coefficient that's not
0, but it's not statistically different from 0.

• Partial Autocorrelation:
 This is another metric used in time
series analysis.
 It measures the correlation
between a variable and its lag
while controlling for the values
at all shorter lags.
 It is helpful for identifying the true
order of an autoregressive
process (ARIMA model!) 9

Example: Computation of the Lag 1


Autocorrelation Coefficient
Harry Vernon has collected data on the number of VCRs sold last year
by Vernon’s Music Store.

• r1, the successive


monthly sales of
VCRs are somewhat
correlated with each
other
• The correlation
between Yt and the
autocorrelation for
time lag 2, is .463.

10

5
9/21/2023

Correlogram or Autocorrelation Function


• a plot of the autocorrelations
versus time lags
• The horizontal scale on the
bottom of the graph shows
each time lag of interest: 1,
2, 3
• The horizontal line in the
middle of the graph
represents autocorrelations
of zero.
• The vertical line that extends
upward above time lag 1
Autocorrelations shows AC of .57, lag 2 AC of
Lag ACF T LBQ .46, or .
1 0.571913 1.98 5.00
2 0.462687 1.25 8.59
• The dotted lines and the T
3 0.110583 0.27 8.82 (test) and LBQ (Ljung-Box
Q) statistics

11

Example: high/weak autocorrelation

• Scenario: Monthly average temperatures in a city with


a temperate climate. • a lag of 1 month (i.e.,
comparing each month
to the previous month),
there would be a high
positive
autocorrelation.
• This is because, for
most months, the
temperature either
steadily increases or
steadily decreases
from the previous
month, following a
seasonal pattern.
12

6
9/21/2023

Autocorrelation Model
Independent errors:
et does not depend on et-1 ( = 0)

Autocorrelated errors: The errors are not independent


et depends on et-1 ( 0)

Assumed Model: et =  et-1 + ut


where ut is assumed non-autocorrelated
Independent errors:
et does not depend on et-1 (r = 0)

Autocorrelated errors:
et depends on et-1 (r 0)

 The residuals will show a pattern over time


13

Autocorrelated Residuals
Common Rare
Positive Autocorrelation Negative Autocorrelation

20 20
15 15
10 10
Residual

5 5
Residual

0 0
-5 -5
-10 -10
-15 -15
-20 -20
Tim e Tim e

When a residual tends to be followed When a residual tends to be followed


by another of the same sign, we by another of opposite sign, we
have positive autocorrelation have negative autocorrelation

Look for cycles of of + + + + followed by - - - - Look for alternating + - + - pattern

14

7
9/21/2023

the observation of
the error term (ut) is
a function of the
previous (lagged)
observation of the
error term (ut-1)

15

Detect the presence of Autocorrelation


• Ljung-Box Q (LBQ) statistic:
 The Ljung-Box test is used to check whether any
group of autocorrelations of a time series are
different from zero. It is not limited to the detection
of just first-order autocorrelation.
 The test is based on a summation of squared
autocorrelations up to a certain lag k.
 The LBQ statistic tends to increase as more lags
with significant autocorrelations are included in the
summation.
• Durbin-Watson statistic:
 The Durbin-Watson statistic primarily tests for first-
order autocorrelation in the residuals of a time
series regression model.
 The test statistic d (the calculation shown in next few
slides)

16

8
9/21/2023

First-order autocorrelation
• First-order autocorrelation refers to the correlation between a
time series and its own values lagged by one period.
 If Yt represents the time series, first-order autocorrelation quantifies the
correlation between Yt and Yt-1
• Positive First-Order Autocorrelation
 If the time series has a positive first-order autocorrelation, it means that if
a particular value is above the mean, then the next value is also likely to
be above the mean. Similarly, if a value is below the mean, the next value
is likely to be below the mean.
 This often indicates a momentum or trend in the data.
• Negative First-Order Autocorrelation
 This suggests that if a value is above the mean, the next value is likely to
be below the mean, and vice versa.
 This can indicate a sort of "mean-reverting" oscillation.
• Near-Zero First-Order Autocorrelation
 This indicates that consecutive values in the time series are essentially
independent of each other
17

Autocorrelations for identifing time series data patterns


• To study data patterns (which include seasonality, trend),
using autocorrelations.
 The autocorrelation coefficients at different time lags (r1, r2, r3, …) are
used to identify time series data patterns.
• If a series is random,
 the autocorrelations between Yt and for any time lag k are close to zero.
 The successive values of a time series are not related to each other.
• If a series has a trend,
 successive observations are highly correlated, and the rk are
significantly different from zero for the first several time lags and then
gradually drop toward zero when the number of lags increases.
• If a series has a seasonal pattern,
 a significant autocorrelation coefficient will occur at the seasonal time
lag or multiples of the seasonal lag.
 The seasonal lag is 4 for quarterly data and 12 for monthly data.
18

9
9/21/2023

19

20

10
9/21/2023

21

Time Series Plot of Sales_1

400000

300000
Sales_1

200000

100000

0
1 8 16 24 32 40 48 56 64 72 80
Index

22

11
9/21/2023

Autocorrelation Function for Sales_1


(with 5% significance limits for the autocorrelations)

1.0

0.8

0.6

0.4
Autocorrelation

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

1 5 10 15 20 25 30 35 40 45 50 55 60
Lag

23

Testing for Autocorrelation

• The Durbin-Watson Statistic is used to test for


autocorrelation

H0: ρ = 0 (residuals are not correlated)


HA: ρ ≠ 0 (autocorrelation is present)

n
Durbin-Watson test statistic:
 (e t  e t 1 )2
d t 1
n
Assumed Model: et = r et-1 + ut
e
2
t
where ut is assumed non-autocorrelated t 1

24

12
9/21/2023

Testing for Positive Autocorrelation

H0: ρ = 0 (positive autocorrelation does not exist)


HA: ρ > 0 (positive autocorrelation is present)
 Calculate the Durbin-Watson test statistic = d
(The Durbin-Watson Statistic can be found using PHStat or Minitab)

 Find the values dL and dU from the Durbin-Watson table


(for sample size n and number of independent variables p)

Decision rule: reject H0 if d < dL


Reject H0 Inconclusive Do not reject H0

0 dL dU 2
25

Textbook page 484


Critical Points of the Durbin-Watson Statistic: =0.05,
n= Sample Size, k = Number of Independent Variables

26

13
9/21/2023

Textbook page 484

Critical Points of the Durbin-Watson Statistic: =0.05,


n= Sample Size, k = Number of Independent Variables
k=1 k=2 k=3 k=4 k=5
n dL dU dL dU dL dU dL dU dL dU
15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21
16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15
17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10
18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06
. . . . . .
. . . . . .
. . . . . .
65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77
70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77
75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.49 1.77
80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77
85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77
90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78
95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78
100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78
27

Durbin-Watson Test

Conclude that Conclude that Conclude that


positive Zone of autocorrelation is Zone of negative
autocorrelation indecision absent indecision autocorrelation
exists exists

0 dL dU 2 4-dU 4-dL 4

n=68, dL~1.57, dU=1.64

28

14
9/21/2023

Testing for Positive Autocorrelation


• Example with n = 25: (continued)
160

140

Durbin-Watson Calculations 120

100
Sum of Squared

Sales
80 y = 30.65 + 4.7038x
Difference of Residuals 3296.18 2
60 R = 0.8976
Sum of Squared
40
Residuals 3279.98
20

Durbin-Watson 0
Statistic 1.00494 0 5 10 15 20 25 30
Tim e

Example 3.2  (e t  e t 1 )2
3296.18
d t 1
  1.00494
page 69
n
3279.98
e
2
t
t 1

29

Testing for Positive Autocorrelation

• Here, n = 25 and there is one independent variable


• Using the Durbin-Watson table, dL = 1.29 and dU = 1.45
• d = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
• Therefore the linear model is not the appropriate model to
forecast sales
Decision: reject H0 since
d = 1.00494 < dL

Reject H0 Inconclusive Do not reject H0


0 dL=1.29 dU=1.45 2

30

15

You might also like