Professional Documents
Culture Documents
Introduction To Forecasting
Introduction To Forecasting
=
L
i
i
S ;
- Multiplicative: { } ( )
t t
S t f X E = ; , ,
2 1
| | ; L S
L
i
i
=
=1
;
- where = = =
+ + L t L t t
S S S
2
and ( ) t f ; , ,
1 0
| | is a function describing the trend.
Each observation period is called a season, L is the length of seasonality and S
t
are seasonal
indexes.
- Cycles are an irregular tendency to oscillate at an almost fixed frequency.
- An outlier (or influential point) is an unusual observation.
A
Constant
process
B Pulse
C Ramp
D Step
No error With random error
Figure 5 Some basic patterns
A
No trend effect
B
Additive trend
C
Multiplicative
trend
No seasonal effect Additive seasonal Multiplicative seasonal
1 2 3
Figure 6 Patterns based on Pegel's 1969 classification
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
20
30
40
50
60
Additive seasonality
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
20
30
40
50
60
70
80
Multiplicative seasonality
Figure 7 Seasonality with linear trend
Before building mathematical models for predicting time series, a forecaster should investigate
the basic structure of the time series. For instance:
- Is the series cyclic?
- Does it increase (or decrease) over time?
- Does the variability remain constant over time?
- Are there carry-over effects from previous values of the series?
- How long do these effects last?
To help answer some of these questions there are a few useful and simple graphical techniques
that can be used.
7 Plotting Data
The first step in the analysis of any time series is to plot the data against time. This immediately
reveals any trend over time, any regular seasonal behaviour and other systematic features of the
data. These need to be identified so that we can incorporate them into our mathematical model
later
Another useful graph is the simple histogram. This reveals whether the data are skewed or
roughly symmetric. Most time series models require the data to have roughly symmetric
histograms. We will consider ways of transforming data to make it symmetric shortly.
7.1 Descriptive Statistics
It is often helpful to calculate several statistics from a time series. For example, the mean and
standard deviation are very useful in measuring the location and spread of the data.
Another statistic we shall use is the coefficient of skewness. This is a measure of how skewed
the histogram of the data is. Positive skewness indicates the histogram has a long right-hand tail,
negative skewness indicates the histogram has a long left-hand tail. Symmetric data have zero
skewness. These statistics are calculated (for large sample sizes) as follows:
Mean:
=
=
n
i
i n
X X
1
1
Standard
Deviation:
( ) ( )
= =
= =
n
i
i n
n
i
i n
X n X X X s
1
2 2
1
1
2
1
Coefficient of
Skewness:
( ) 3
1
3 2 3
1
3
1
3
1
3
2
1
3
|
|
.
|
\
|
+
=
= =
= =
n
i
n
i i n
n
i
i n
X n
i
X X X X X v
7.2 Plots and Statistics using ITSM
With ITSM the data can be plotted by selecting new project followed by import data and
selecting the ASCII file which causes a graph of the data vs. time to appear on the screen. The
project can then be saved eg as Name.tsm (the tsm extension is added automatically).
Histograms may be plotted from several submenus. By selecting the appropriate menu
associated statistics are displayed.
Example: To illustrate this procedure, we shall use the data file Beer.tsm. This is the Australian
quarterly beer production in megalitres from March 1956 to June 2010 (after which the data were
no longer published). Using ITSM, we can then obtain the time plot similar to that shown in
Figure 9.
A histogram of the data is given by clicking on the histogram button, or by using the menu entry
Statistics, followed by histogram.
Clicking on the INFO button (while the time series
graph is active) gives the following information:
# of Data Points = 218
Sample Mean = .4154E+03
Sample Variance = .734166E+04
Std.Error(Sample Mean) = 17.861581
(square root of (1/n)SUM{(1-|h|/r)acvf(h)},
|h|<r=[sqrt(n)])
MODEL:
ARMA Model:
X(t) = Z(t)
WN Variance = 1.000000
Garch Model for Z(t):
Z(t) = sqrt(h(t)) e(t) h(t) = 1.000000
{e(t)} is IID N(0,1)
Figure 8 Histogram for Beer data
8 Modelling Approach
From the histogram and time plot, various features of the data become apparent. Seasonal
behaviour, long-term trends, changes in level and variability over time are frequently observed.
Example: For the beer data, we observe that
- the mean
- the variation
- a systematic component
- changes
In forecasting, we need a model that incorporates these basic features of the data. The basic
approach we adopt is to systematically eliminate each of these features until we have totally
featureless data.
A time series that has no trend or seasonality and has mean, variance and covariances constant
over time is called stationary. The first stage in eliminating data features is to make the series
stationary. Once we have a stationary series, we consider other features that are not so apparent
to the naked eye. Having incorporated these into our model as well, we will finally be able to
forecast!
Figure 9 Quarterly Australian Beer production in megalitres from March 1956 to J une 2010
5
Plot
histogram
Move between
available windows
Carry out transformations Box-
Cox, difference data, remove trend
etc.; Select a subset of the data
(trim). Reverse transformations.
Information on the
current window
Plot sample
ACF/PACF
Tests for
randomness
Figure 10 Pre-mixed concrete production (000 cubic metres) February 1976 March 2014
Figure 11 Logarithm of Pre-mixed concrete data
400.
600.
800.
1000.
1200.
1400.
1600.
1800.
2000.
2200.
2400.
2600.
0 50 100 150 200 250 300 350 400 450
Series
6.20
6.40
6.60
6.80
7.00
7.20
7.40
7.60
7.80
8.00
0 50 100 150 200 250 300 350 400 450
Series
1.11
9 Box-Cox Transformations
Different techniques are used for eliminating each type of feature. One such technique
is the Box-Cox transformation which may let you turn skewed data into symmetric data
and make the variance constant over time. If the original observations are Y
1
, Y
2
, . ,
Y
n
, the Box-Cox transformation f converts them to Z
1
, Z
2
, . , Z
n
, where and
( ) Y f Z
t
=
( )
( )
=
=
=
0 log
0
1
y
y
y f
These transformations are especially useful when the variability of the data increases or
decreases with the level. By suitable choice of , the variability can often be made
nearly constant. In particular, for positive data with standard deviation increasing
linearly with level, the variability can be stabilised by choosing = 0.
A choice of < 1 will decrease skewness while > 1 will increase skewness.
The choice of can be made by trial and error, using the graphs of the transformed
data. (After inspecting the graph you can use the sliding scale to try various values of
.) Very often it is found that no transformation is needed or that the choice = 0 is
satisfactory.
Logarithms are common in economics, for example, and indicate that percentage
changes are important.
Where the time series contains zero or negative values, it is necessary to add a constant
to the series to make all values positive.
The Box-Cox transformation can be carried out using ITSM by selecting that option
from the Data Menu. The transformation may be reversed.
Example: For the concrete data the variability increases with level and the data are
strictly positive. Taking natural logarithms (for instance, choosing a Box-Cox
transformation with = 0) gives the transformed data shown in Figure 11. (Using
ITSM, open the univariate data file Pre-MixedConcrete.tsm, select Transform, then the
Box-Cox transformation option and move the parameter pointer (ranges between 0 and
1.5) and observe how the graph changes.)
The characteristics of the transformed series are,
- the mean
- the variation
- a systematic component
Notice how the variation no longer increases. The seasonal effect remains, as does the
upward trend. These need to be handled (using other techniques which shall be
discussed later). If the log transformation has stabilised the variability, it is not
necessary to consider other values of . Note that the data stored in ITSM would now
consist of the natural logarithms of the original data.
1.12
10 Sample autocorrelation and partial autocorrelation functions
When concerned about independence in time series it is natural to calculate the
correlation coefficient between successive observations in the series:
For the house price data shown in Error! Reference source not found., a scatter plot
of one observation with the previous one (Y
t
and Y
t-1
) is shown below the correlations
between the adjacent observations of 0.998. The scatter plot of one observation with the
observation before the previous one (Y
t
and Y
t-2
) is also shown below; the correlations
between the observations 1 lag apart being 0.9703. And for observations two, three and
four lags apart the correlations are 0.9357, 0.8978 and 0.8561 respectively.
120 80 40
120
100
80
60
40
20
120 80 40
90 60 30
Y_t-1
Y
_
t
Y_t-2 Y_t-3
Figure 12 House prices scatter plots of Y
t
with Y
t-1
, Y
t-2
, and Y
t-3
(using
Minitab draftsman plots)
The correlogram is a plot of the sample autocorrelation function (acf) with lag 1-40.
Figure 13 Correlogram of house price data
Figure 14 Correlogram for random data (n =100) simulated with seed 12345
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
Correlation
=0.999
Correlation
=0.997
Correlation
=0.993
1.13
For a series for which Y
t
is correlated with Y
t-1
(say with correlation ) then Y
t
will be
correlated with Y
t-2
(with correlation
2
). We can determine the degree of correlation
between Y
t
and Y
t-2
after adjustment for the first order correlation. This produces the
partial autocorrelation function (PACF). There are two basic methods: one involves
regressing Y
t
on Y
t-1
, Y
t-2
,, Y
t-k
, so that the partial autocorrelation coefficient
kk
for lag
k is the coefficient of Y
t-k
. The other method calculates the coefficients recursively.
6
Note that
0
=
00
= 1, and
1
=
11
.
Figure 15 ACF & PACF for the house price data
Suppose we were to difference the data at lag 1 letting
1
=
t t t
X X Y
Figure 16 Time series plot for the first differences in house prices
Figure 17 ACF & PACF for the first differences in house prices
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-3.
-2.
-1.
0.
1.
2.
3.
4.
5.
0 20 40 60 80 100
Series
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
1.14
Figure 18 ACF/PACF of ln(Portland cement) quarterly from March 1956to March 2014
Figure 19 ACF/PACF of quarterly Sparkling wine from March 1980 to March 2014
Testing whether a particular autocorrelation or partial autocorrelation coefficient is zero
can be done using a rule of thumb
7
.
ACF H
0
:
k
= 0 k = 0 against H
0
:
k
= 0 using ( ) n N
k
1 , 0 approx ~
PACF H
0
: |
kk
= 0 against H
0
: |
kk
= 0 using ( ) n N
kk
1 , 0 approx ~
Some common patterns in the ACF are:
1. Very slowly dying out - Like Figure 18 evidence of trend in the series.
2. Spike at lag L and at multiples of lag L. Like Figures 19 and 21 evidence of
seasonality
Figure 20 Time series plot of ln(Beer) after removing quadratic trend
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-. 2 0 0
-. 1 0 0
. 0 0 0
. 1 0 0
. 2 0 0
. 3 0 0
0 4 0 8 0 1 2 0 1 6 0 2 0 0
S e rie s
# of Data Points =
213
Sample
Mean = 6.0048
Variance .051654
1.15
Figure 21 ACF/PACF of ln(Beer) after removing quadratic trend
Figure 22 Time series plot of ln(Beer) after differencing at lag 4
Figure 23 ACF/PACF of ln(Beer) after differencing at lag 4
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SampleACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SamplePACF
-. 1 0 0
-. 0 5 0
. 0 0 0
. 0 5 0
. 1 0 0
. 1 5 0
0 4 0 8 0 1 2 0 1 6 0 2 0 0
S e rie s
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SampleACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SamplePACF
# of Data Points =
209
Sample
Mean = .0097
Variance =.001987
1.16
Figure 24 Time series plot of ln(Beer) after differencing at lag 4 and at lag 1
Figure 25 ACF/PACF of ln(Beer) after differencing at lag 4 and at lag 1
11 White noise
Many time series models are defined using white noise processes. Borrowing from
engineering terminology a time series may be considered to consist of a signal and an
irregular noise component. If the noise component is a stationary series of uncorrelated
random variables having zero mean and constant variance it is referred to as white
noise. Equivalently a white noise process has an autocorrelation function:
( )
=
=
=
0 0
0 1
k
k
k
The process is often defined by ( )
2
, 0 ~ o WN Y
t
or ( )
2
, 0 ~ o IID Y
t
or ( )
2
, 0 ~ o NID Y
t
if
the process is Gaussian, ie. normally distributed. The most common purpose of white
noise tests is to check the residuals after fitting a model to some time series data. The
residuals are defined as the h-step prediction errors:
h t t t t
Y Y e
=
where Y
t
is actual
observation at time t and
h t t
Y
=
n
n n
R
o
1.18
12.5 Turning points test of randomness
Counting the number of 'peaks' and 'troughs' that appear in the data gives the number of
turning points, T is approximately normally distributed with ( ) 2
3
2
= n
T
and
90
29 16
2
=
n
T
o . ITSM includes this statistic in the set of tests for randomness.
12.6 Phase length test of randomness
The interval between two turning points is called phase. We count the number of times
each phase length occurs and calculate the expected number of occurrences of each
phase length in a random series length n given by
( )( )
( )! 3
1 3 2 2
2
+
+ +
d
d d d n
and since
the usual _
2
test is invalid because the phase lengths are not independent, a modified _
2
test is suggested using 2
1
/
2
degrees of freedom for values > 6.3 and (
6
/
7
) with two
degrees of freedom for lower values. (See Kendall pages 25 and 26 for details.
12.7 Rank tests
There are a number of rank tests which are also used to detect non-randomness. For
instance the rank correlation coefficient, sometimes known as Kendall's t is based on
P, the number of pairs in the series u
t
where u
j
> u
i
. Firstly, P is distributed with mean
( ) 1
4
1
n n and variance
( )( )
8
5 2 1 + n n n
and is asymptotically normal. It can be shown
that
( )
1
1
4
=
n n
P
t has mean zero and variance
( )
( ) 1 9
5 2 2
+
n n
n
It is the statistic P that is used in the ITSM programme.
12.8 Spectral representation of a random time series
A time series can also be thought to consist of the weighted sum of cyclic functions
(sines and cosines for instance). If every frequency were contribute equally to the series
a plot of the contributions by the frequencies would be flat. This plot is called a
spectrum.
For a random series, the spectrum would resemble:
t
Frequency
Power
Figure 26 Model spectrum for a random series
We will see later how to interpret sample specta. For the ln(Beer) differenced at lag 4,
choosing Spectrum, Cumulative periodogram produces
1.19
Figure 27 Cumulative periodogram for
)
And choosing Spectrum, Fishers test shows that we should reject the hypothesis that all
frequencies contribute equally to the variance of the series (so the data in Figure 22 are
not random).
12.9 Test of Normality
Non-normality of the residuals may first be assessed using a histogram
(or stem and leaf plot) and checked more carefully using (for instance in Minitab)
NSCORES of the data in C store in C
PLOT normal scores in C with the data in C
which should be linear. The data are considered to be non-normal if the correlation
measure of linearity
CORR normal scores in C with the data in C
falls below the critical values:
Sample Significance level
size, n 0.01 0.05 0.10
50 0.966 0.976 0.981
60 0.971 0.980 0.984
75 0.976 0.984 0.987
100 0.981 0.986 0.989
150 0.987 0.991 0.992
200 0.990 0.993 0.994
Table 3 Critical values for the normal scores test
.00
.20
.40
.60
.80
1.00
.0 .5 1.0 1.5 2.0 2.5 3.0
1.20
Normality tests available in Excel include the Lilliefors test for instance for the house
price data with histogram:
Figure 28 Histogram of the house price data
Figure 29 Testing normality using the StatPro Add-I ns in Excel
The test for randomness described above are all valid even if the data are not normally
distributed.
13 Self assessment exercises
Use ITSM to duplicate the plots
1.21
Forecasting 1
Week 1: Introduction to time series ....................................................................................... 1
1 Objectives ......................................................................................................................... 1
2 Introduction ..................................................................................................................... 1
3 Classifying forecasting problems ................................................................................... 3
3.1 Subjective methods .................................................................................................. 3
3.2 Univariate methods .................................................................................................. 4
3.3 Multivariate methods ............................................................................................... 4
3.4 Categories of Forecasting Methods and Examples of Their Applications .............. 4
4 Forecasting in practice ................................................................................................. 4
5 Choosing a method .......................................................................................................... 5
6 Some patterns in time series ........................................................................................... 5
7 Plotting Data .................................................................................................................... 7
7.1 Descriptive Statistics ............................................................................................... 7
7.2 Plots and Statistics using ITSM ............................................................................... 8
8 Modelling Approach ....................................................................................................... 8
9 Box-Cox Transformations ............................................................................................ 11
10 Sample autocorrelation and partial autocorrelation functions ................................ 12
11 White noise ..................................................................................................................... 16
12 Tests for white noise ...................................................................................................... 17
12.1 Time plot ................................................................................................................ 17
12.2 Tests for serial correlation ..................................................................................... 17
12.3 Difference-sign test ................................................................................................ 17
12.4 A runs test of randomness...................................................................................... 17
12.5 Turning points test of randomness ......................................................................... 18
12.6 Phase length test of randomness ............................................................................ 18
12.7 Rank tests ............................................................................................................... 18
12.8 Spectral representation of a random time series .................................................... 18
12.9 Test of Normality ................................................................................................... 19
13 Self assessment exercises............................................................................................... 20
14 Questions for submission Week 1 Due: 7 August 2014 . Error! Bookmark not defined.
1.22
1
This section is based on Chatfield, C (1996), The analysis of time series: an introduction,
Chapman & Hall (London)
2
Armstrong, J. S. (1985), Long-Range Forecasting (2nd ed.), Wiley (New York).
3
Wright, G. and P. Ayton (1987), J udgmental forecasting, Wiley (Chichester)
4
Gardner E. S. Jr, (1983), Automatic monitoring of forecast errors, J . Forecasting 2 :1-21.
5
Available as an Excel file, Beer.xls from MyRMIT Studies. Open the file and highlight just
the data (no headings) and transfer the file to the clipboards (Ctr_C). In ITSM, successively
select File, Project, New, Univariate, then select File Import Clipbard and you will see a
graph of the data. At this stage you should save the file as an ITSM object (Select File,
Project, Save as, select the directory you want to save the file to, type Beer in the File Name
box and with the Files of Type selection automatically being ITSM files (*.TSM) press
Save. The file can subsequently be opened directly as an ITSM file, Beer.tsm).
6
Farnum & Stanton, (1989), page 291.
7
Bartlett, M.S. (1946), On the theoretical specification of sampling properties of
autocorrelated time series, J . Roy. Stat. Soc. B, 8: 27-41.