Introduction To Forecasting

Forecasting
Introduction to time series

1 Objectives
- Obtain an appreciation of the nature of time series and forecasting problems;
- describe qualitative and quantitative approaches to forecasting;
- be able to use computer packages to highlight features of time series and to explain what
they reveal; and
- use mathematical shorthand to define a forecast.
2 Introduction
A time series is a sequence of observations taken over a period of time. For example: annual
crop production, quarterly imports, monthly sales figures, weekly interest rates, daily stock
prices, daily maximum temperatures, and electrocardiograph measurements are all time series.
Forecasting the future values of an observed time series may be important in such areas as
planning, sales forecasting, stock control and economics. By forecasting we mean predicting
future values of that series.
We may denote the time series by the symbol X
t
. That is, X
t
represents the observation of the
time series at time t.
For a discrete time series, we may have observed the series for a set of times
T
0
={1, 2, . , n} and be interested in forecasting the series for t = n + 1, n + 2, . .
Continuous time series are recorded continuously over a time interval, eg T
0
=[0, 1].
Some examples of time series are shown below. Indexes of house prices are produced quarterly
by the Australian Bureau of Statistics for each capital city. A weighted average (from June 1986
to December 2012 based on 2011-12 = 100) for 8 major Australian cities of the indexes of
established house prices is shown in Error! Reference source not found..
The daily closing values of the Australian all ordinaries share price index from 11 February 1999
to 8 July 2014 (data in AllOrds.xlsx) are shown in Figure 1.

Figure 1 Australian all ordinaries index closing values daily from 2
J anuary 1998 to 8 J uly 2014 (in AllOrds.tsm)
0
1000
2000
3000
4000
5000
6000
7000
1
9
9
2
1
9
9
3
1
9
9
4
1
9
9
5
1
9
9
6
1
9
9
7
1
9
9
8
1
9
9
9
2
0
0
0
2
0
0
1
2
0
0
2
2
0
0
3
2
0
0
4
2
0
0
5
2
0
0
6
2
0
0
7
2
0
0
8
2
0
0
9
2
0
1
0
2
0
1
1
2
0
1
2
2
0
1
3
2
0
1
4

Figure 2 is a plot of motor vehicle sales (Motor Vehicles 931401.xlsx) from 1994 to date.

Figure 2 Monthly number of vehicles sold by type (000 vehicles)
The number of seizures experienced by a patient were recorded. The first 150 values are shown
in Table 1 (read across the rows) and shown diagrammatically in Figure 3.
0 2 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 2 0 2 0 4 1 0 0 0 1 0 1
0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 1 0 0 0 2 0 0 1 0 0 2 6 0 2 0 1 1 0 0 0 1 0 0 0 1 0
0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0
Table 1 Number of seizures experienced by a patient over 150 days (in Seizures.tsm)
0
1
2
3
4
5
6
7
1 51 101 151 201 251

Figure 3 Number of seizures per day experienced by a patient over 299 days
Figure 2 is a plot of quarterly sales of all sparkling wine (production by bottle and bulk
fermentation in 000 litres) by Australian wine makers from March 1980 to March 2014.
0
20
40
60
80
100
120
1
9
9
4
1
9
9
5
1
9
9
6
1
9
9
7
1
9
9
8
1
9
9
9
2
0
0
0
2
0
0
1
2
0
0
2
2
0
0
3
2
0
0
4
2
0
0
5
2
0
0
6
2
0
0
7
2
0
0
8
2
0
0
9
2
0
1
0
2
0
1
1
2
0
1
2
2
0
1
3
Sports Utilities
All vehicles
Passenger vehicles
Other

Figure 4 Sales of sparkling wine by Australian wine makers 000 litres -monthly from
March 1980 to March 2014 (in WineI ndQtly.xlsx and Sparkling.tsm)
3 Classifying forecasting problems
1

Frequently there is a time lag between awareness of an impending event or need and occurrence
of that event. This lead time is the main reason for planning and forecasting. If the lead time is
zero or very small, there is no need for planning. If the lead time is long and the outcome of the
final event conditional on identifiable factors, planning can perform an important role. In such
situations forecasting is needed to determine when an event will occur, or a need arise so that
appropriate actions can be taken.
The forecasting problem is to estimate future values such as X
n+k
from the observed values X
1
,
X
2
, . ,X
n
, where the integer k is called the lead time.
n k n
X
+
denotes a forecast for time n+k

based on data up to time n. A wide variety of forecasting procedures are available and no single
method is universally applicable. Forecasts are conditional statements about the future based on
specific assumptions. They may need to be modified in the light of external information.
Forecasting methods can be broadly classified into:
3.1 Subjective methods
Forecasts are made using judgement, intuition, commercial knowledge and any other relevant
information. Methods range from bold freehand extrapolation to the Delphi technique, in which
a group of forecasters try to obtain a consensus forecast with controlled feedback of other
analysts preliminary predictions. (Armstrong (1985)
2
, Wright and Ayton (1987)
3
) For example,
in deriving the forecasts of sales, the views of the sales force or the area sales managers might be
the basis for the forecasting method. Qualitative methods try to take into account opinions,
experience, intuition, judgement, behavioural experience and other processes that are difficult to
incorporate into a formal model. The most popular qualitative methods are
(a) Panel consensus - a group of experts confer to produce an agreed set of numbers using
brainstorming and other formal group techniques. The method can be reasonably quick
but the resulting forecasts may be dominated by a few individuals.
(b) Delphi - as developed by the Rand Corporation, a group of experts produce independent
forecasts, while feeding back the reasons for divergent forecasts. They have been used to
explore social change, war prevention techniques and medical developments. It is slow,
but the resulting forecasts are more independent than under panel consensus.
(c) Market research - systematic methods for testing market hypotheses. Useful as a
monitoring tool for checking the assumptions used in creating forecasts.
(d) Visionary forecasting - personal judgements and insights producing likely scenarios for
the future under optimistic, pessimistic and most likely conditions.
2
4
6
8
10
12
14
16
18
M
a
r
-
8
0
M
a
r
-
8
2
M
a
r
-
8
4
M
a
r
-
8
6
M
a
r
-
8
8
M
a
r
-
9
0
M
a
r
-
9
2
M
a
r
-
9
4
M
a
r
-
9
6
M
a
r
-
9
8
M
a
r
-
0
0
M
a
r
-
0
2
M
a
r
-
0
4
M
a
r
-
0
6
M
a
r
-
0
8
M
a
r
-
1
0
M
a
r
-
1
2
M
a
r
-
1
4

(e) Historical analogy - comparison with similar markets and products. Risky.
3.2 Univariate methods
Forecasts of a given variable are based on a model fitted only to past observations of the given
time series, so that ( ) k x
n
depends only on , ,
1 n n
x x . For instance, forecasts of future sales of a
product could be based entirely on past sales. These are often termed nave or projection or
extrapolative methods. Spreadsheets are well suited to applying these methods. They can range
from the nave methods of simply taking the last value through moving average and exponential
smoothing to full ARIMA modelling.
3.3 Multivariate methods
Forecasts of a given variable depend at least partly on values of one or more other series. For
instance, sale forecasts may depend on current stocks and economic indices as well as past sales.
These are often termed causal models.
3.4 Categories of Forecasting Methods and Examples of Their Applications
Type of information available
Type of
forecasting
situation
Sufficient quantitative information
is available
Little or no quantitative
information is available but
sufficient qualitative
knowledge exists
Little or no
information is
available
Time series
methods
Explanatory or
casual methods
Exploratory
methods
Normative
methods

Forecasting
continuation of
patterns or
relationships
Predicting the
continuation of
growth in sales
or gross national
product
Understanding
how prices and
advertising
affect sales
Predicting the
speed of
transportation
in the year
2032
Prediction
how
automobiles
will look in
the year 2022
Predicting the
effects of
inter-
planetary
travel
Forecasting
changes - or
when changes
will occur - in
existing patterns
or relationships
Predicting the
next recession or
how serious it
will be
Understanding
how the effects
of price controls
or the banning
of advertising
on TV will
affect sales
Forecasting
how a large
increase in oil
prices will
affect the
consumption
of oil
Having
predicted the
oil embargo
which
followed the
Arab-Israeli
war
discovery of a
new, very
cheap form of
energy that
produces no
pollution
Table 2 Categories of Forecasting Methods and Examples of Their Applications Based on
Makridakis et. al. page 2.
4 Forecasting in practice
In practice, a forecasting procedure may involve a combination of several approaches. In
particular, suppose you know the company is planning a marketing campaign to promote a
certain product. You would want to take this additional information into account.
An alternative way of classifying forecasting methods is between an automatic approach
requiring no human intervention and a non-automatic approach requiring some subjective input
from the forecaster. Most univariate methods can be made fully automatic, but they can also be
applied in a non-automatic form and often the results differ. Complexity and sophistication do
not necessarily improve forecasting performance.

5 Choosing a method
The choice of method depends on a variety of considerations including
(a) How the forecast is to be used
(b) The type of time series and its properties (eg using Pegels classification - see below)
(c) How many past observations are available (and how many are relevant?)
(d) The length of the forecasting horizon. (Long-term vs Short-term forecasting)
(e) The number of series to be forecast and the cost allowed per series.
(f) The skill and experience of the analyst and the computer programs available.
It is particularly important to spell out the objectives of the investigation. Why is the forecast
needed? How will it be used? (Some forecasts are self-fulfilling.) In business forecasting
should be an integral part of the management process - we need a systems approach.
Commonly only point forecasts are reported, however, prediction intervals are important to
indicate uncertainty in the process, in the specification and estimation of the model. Whatever
forecasting method is used, one needs a forecast monitoring scheme - particularly with a large
number of series and with automatic forecasting. Gardner (1983)
4
describes a number of useful
tracking signals for detecting trouble.
6 Some patterns in time series
There are a number of conventional terms for patterns that commonly occur in time series. One
classification has been provided by Pegel:
- Trend refers to a general tendency for a series to steadily increase or decrease over time.
- Seasonality is a regular effect of known frequency (daily, weekly, monthly or quarterly)
caused originally by the movement of the earth around the sun, reflected in annual weather
patterns and hence in social behaviour resulting from adherence to an annual calendar. A
time series X
t
observed L times per year at times t = 1, 2, is said to have constant
seasonality (additive/multiplicative) if the average value X
t
changes over time so that
- Additive: { } ( )
t t
S t f X E + = ; , ,
2 1
| | ; 0
1
=
=
L
i
i
S ;
- Multiplicative: { } ( )
t t
S t f X E = ; , ,
2 1
| | ; L S
L
i
i
=
=1
;
- where = = =
+ + L t L t t
S S S
2
and ( ) t f ; , ,
1 0
| | is a function describing the trend.
Each observation period is called a season, L is the length of seasonality and S
t
are seasonal
indexes.
- Cycles are an irregular tendency to oscillate at an almost fixed frequency.
- An outlier (or influential point) is an unusual observation.

A
Constant
process
B Pulse
C Ramp
D Step
No error With random error

Figure 5 Some basic patterns
A
No trend effect
B
Additive trend
C
Multiplicative
trend
No seasonal effect Additive seasonal Multiplicative seasonal
1 2 3

Figure 6 Patterns based on Pegel's 1969 classification

Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
20
30
40
50
60
Additive seasonality

Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
Q
1
Q
2
Q
3
Q
4
20
30
40
50
60
70
80
Multiplicative seasonality

Figure 7 Seasonality with linear trend
Before building mathematical models for predicting time series, a forecaster should investigate
the basic structure of the time series. For instance:
- Is the series cyclic?
- Does it increase (or decrease) over time?
- Does the variability remain constant over time?
- Are there carry-over effects from previous values of the series?
- How long do these effects last?
To help answer some of these questions there are a few useful and simple graphical techniques
that can be used.
7 Plotting Data
The first step in the analysis of any time series is to plot the data against time. This immediately
reveals any trend over time, any regular seasonal behaviour and other systematic features of the
data. These need to be identified so that we can incorporate them into our mathematical model
later
Another useful graph is the simple histogram. This reveals whether the data are skewed or
roughly symmetric. Most time series models require the data to have roughly symmetric
histograms. We will consider ways of transforming data to make it symmetric shortly.
7.1 Descriptive Statistics
It is often helpful to calculate several statistics from a time series. For example, the mean and
standard deviation are very useful in measuring the location and spread of the data.
Another statistic we shall use is the coefficient of skewness. This is a measure of how skewed
the histogram of the data is. Positive skewness indicates the histogram has a long right-hand tail,
negative skewness indicates the histogram has a long left-hand tail. Symmetric data have zero
skewness. These statistics are calculated (for large sample sizes) as follows:
Mean:
=
=
n
i
i n
X X
1
1

Standard
Deviation:
( ) ( )

= =
= =
n
i
i n
n
i
i n
X n X X X s
1
2 2
1
1
2
1

Coefficient of
Skewness:
( ) 3
1
3 2 3
1
3
1
3
1
3
2
1
3
|
|
.
|
\
|
+
=
= =

= =
n
i
n
i i n
n
i
i n
X n
i
X X X X X v

7.2 Plots and Statistics using ITSM
With ITSM the data can be plotted by selecting new project followed by import data and
selecting the ASCII file which causes a graph of the data vs. time to appear on the screen. The
project can then be saved eg as Name.tsm (the tsm extension is added automatically).
Histograms may be plotted from several submenus. By selecting the appropriate menu
associated statistics are displayed.

Example: To illustrate this procedure, we shall use the data file Beer.tsm. This is the Australian
quarterly beer production in megalitres from March 1956 to June 2010 (after which the data were
no longer published). Using ITSM, we can then obtain the time plot similar to that shown in
Figure 9.
A histogram of the data is given by clicking on the histogram button, or by using the menu entry
Statistics, followed by histogram.
Clicking on the INFO button (while the time series
graph is active) gives the following information:
# of Data Points = 218

Sample Mean = .4154E+03
Sample Variance = .734166E+04
Std.Error(Sample Mean) = 17.861581
(square root of (1/n)SUM{(1-|h|/r)acvf(h)},
|h|<r=[sqrt(n)])

MODEL:
ARMA Model:
X(t) = Z(t)
WN Variance = 1.000000
Garch Model for Z(t):
Z(t) = sqrt(h(t)) e(t) h(t) = 1.000000
{e(t)} is IID N(0,1)
Figure 8 Histogram for Beer data
8 Modelling Approach
From the histogram and time plot, various features of the data become apparent. Seasonal
behaviour, long-term trends, changes in level and variability over time are frequently observed.
Example: For the beer data, we observe that
- the mean
- the variation
- a systematic component
- changes
In forecasting, we need a model that incorporates these basic features of the data. The basic
approach we adopt is to systematically eliminate each of these features until we have totally
featureless data.
A time series that has no trend or seasonality and has mean, variance and covariances constant
over time is called stationary. The first stage in eliminating data features is to make the series
stationary. Once we have a stationary series, we consider other features that are not so apparent
to the naked eye. Having incorporated these into our model as well, we will finally be able to
forecast!

Figure 9 Quarterly Australian Beer production in megalitres from March 1956 to J une 2010
5

Plot
histogram
Move between
available windows
Carry out transformations Box-
Cox, difference data, remove trend
etc.; Select a subset of the data
(trim). Reverse transformations.

Information on the
current window
Plot sample
ACF/PACF
Tests for
randomness

Figure 10 Pre-mixed concrete production (000 cubic metres) February 1976 March 2014

Figure 11 Logarithm of Pre-mixed concrete data
400.
600.
800.
1000.
1200.
1400.
1600.
1800.
2000.
2200.
2400.
2600.
0 50 100 150 200 250 300 350 400 450
Series
6.20
6.40
6.60
6.80
7.00
7.20
7.40
7.60
7.80
8.00
0 50 100 150 200 250 300 350 400 450
Series

1.11
9 Box-Cox Transformations
Different techniques are used for eliminating each type of feature. One such technique
is the Box-Cox transformation which may let you turn skewed data into symmetric data
and make the variance constant over time. If the original observations are Y
1
, Y
2
, . ,
Y
n
, the Box-Cox transformation f converts them to Z
1
, Z
2
, . , Z
n
, where and
( ) Y f Z
t
=
( )
( )
=
=
=
0 log
0
1
y
y
y f
These transformations are especially useful when the variability of the data increases or
decreases with the level. By suitable choice of , the variability can often be made
nearly constant. In particular, for positive data with standard deviation increasing
linearly with level, the variability can be stabilised by choosing = 0.
A choice of < 1 will decrease skewness while > 1 will increase skewness.
The choice of can be made by trial and error, using the graphs of the transformed
data. (After inspecting the graph you can use the sliding scale to try various values of
.) Very often it is found that no transformation is needed or that the choice = 0 is
satisfactory.
Logarithms are common in economics, for example, and indicate that percentage
changes are important.
Where the time series contains zero or negative values, it is necessary to add a constant
to the series to make all values positive.
The Box-Cox transformation can be carried out using ITSM by selecting that option
from the Data Menu. The transformation may be reversed.
Example: For the concrete data the variability increases with level and the data are
strictly positive. Taking natural logarithms (for instance, choosing a Box-Cox
transformation with = 0) gives the transformed data shown in Figure 11. (Using
ITSM, open the univariate data file Pre-MixedConcrete.tsm, select Transform, then the
Box-Cox transformation option and move the parameter pointer (ranges between 0 and
1.5) and observe how the graph changes.)
The characteristics of the transformed series are,
- the mean
- the variation
- a systematic component
Notice how the variation no longer increases. The seasonal effect remains, as does the
upward trend. These need to be handled (using other techniques which shall be
discussed later). If the log transformation has stabilised the variability, it is not
necessary to consider other values of . Note that the data stored in ITSM would now
consist of the natural logarithms of the original data.

1.12
10 Sample autocorrelation and partial autocorrelation functions
When concerned about independence in time series it is natural to calculate the
correlation coefficient between successive observations in the series:

For the house price data shown in Error! Reference source not found., a scatter plot
of one observation with the previous one (Y
t
and Y
t-1
) is shown below the correlations
between the adjacent observations of 0.998. The scatter plot of one observation with the
observation before the previous one (Y
t
and Y
t-2
) is also shown below; the correlations
between the observations 1 lag apart being 0.9703. And for observations two, three and
four lags apart the correlations are 0.9357, 0.8978 and 0.8561 respectively.
120 80 40
120
100
80
60
40
20
120 80 40
90 60 30
Y_t-1
Y
_
t
Y_t-2 Y_t-3

Figure 12 House prices scatter plots of Y
t
with Y
t-1
, Y
t-2
, and Y
t-3
(using
Minitab draftsman plots)

The correlogram is a plot of the sample autocorrelation function (acf) with lag 1-40.

Figure 13 Correlogram of house price data

Figure 14 Correlogram for random data (n =100) simulated with seed 12345
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
Correlation
=0.999
Correlation
=0.997
Correlation
=0.993

1.13
For a series for which Y
t
is correlated with Y
t-1
(say with correlation ) then Y
t
will be
correlated with Y
t-2
(with correlation
2
). We can determine the degree of correlation
between Y
t
and Y
t-2
after adjustment for the first order correlation. This produces the
partial autocorrelation function (PACF). There are two basic methods: one involves
regressing Y
t
on Y
t-1
, Y
t-2
,, Y
t-k
, so that the partial autocorrelation coefficient
kk
for lag
k is the coefficient of Y
t-k
. The other method calculates the coefficients recursively.
6

Note that
0
=
00
= 1, and
1
=
11
.

Figure 15 ACF & PACF for the house price data
Suppose we were to difference the data at lag 1 letting
1
=
t t t
X X Y

Figure 16 Time series plot for the first differences in house prices

Figure 17 ACF & PACF for the first differences in house prices

-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-3.
-2.
-1.
0.
1.
2.
3.
4.
5.
0 20 40 60 80 100
Series
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF

1.14

Figure 18 ACF/PACF of ln(Portland cement) quarterly from March 1956to March 2014

Figure 19 ACF/PACF of quarterly Sparkling wine from March 1980 to March 2014
Testing whether a particular autocorrelation or partial autocorrelation coefficient is zero
can be done using a rule of thumb
7
.
ACF H
0
:
k
= 0 k = 0 against H
0
:
k
= 0 using ( ) n N
k
1 , 0 approx ~
PACF H
0
: |
kk
= 0 against H
0
: |
kk
= 0 using ( ) n N
kk
1 , 0 approx ~
Some common patterns in the ACF are:
1. Very slowly dying out - Like Figure 18 evidence of trend in the series.
2. Spike at lag L and at multiples of lag L. Like Figures 19 and 21 evidence of
seasonality

Figure 20 Time series plot of ln(Beer) after removing quadratic trend
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample ACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
Sample PACF
-. 2 0 0
-. 1 0 0
. 0 0 0
. 1 0 0
. 2 0 0
. 3 0 0
0 4 0 8 0 1 2 0 1 6 0 2 0 0
S e rie s
# of Data Points =
213
Sample
Mean = 6.0048
Variance .051654

1.15

Figure 21 ACF/PACF of ln(Beer) after removing quadratic trend

Figure 22 Time series plot of ln(Beer) after differencing at lag 4

Figure 23 ACF/PACF of ln(Beer) after differencing at lag 4
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SampleACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SamplePACF
-. 1 0 0
-. 0 5 0
. 0 0 0
. 0 5 0
. 1 0 0
. 1 5 0
0 4 0 8 0 1 2 0 1 6 0 2 0 0
S e rie s
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SampleACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SamplePACF
# of Data Points =
209
Sample
Mean = .0097
Variance =.001987

1.16

Figure 24 Time series plot of ln(Beer) after differencing at lag 4 and at lag 1

Figure 25 ACF/PACF of ln(Beer) after differencing at lag 4 and at lag 1
11 White noise
Many time series models are defined using white noise processes. Borrowing from
engineering terminology a time series may be considered to consist of a signal and an
irregular noise component. If the noise component is a stationary series of uncorrelated
random variables having zero mean and constant variance it is referred to as white
noise. Equivalently a white noise process has an autocorrelation function:
( )
=
=
=
0 0
0 1
k
k
k
The process is often defined by ( )
2
, 0 ~ o WN Y
t
or ( )
2
, 0 ~ o IID Y
t
or ( )
2
, 0 ~ o NID Y
t
if
the process is Gaussian, ie. normally distributed. The most common purpose of white
noise tests is to check the residuals after fitting a model to some time series data. The
residuals are defined as the h-step prediction errors:
h t t t t
Y Y e

=
where Y
t
is actual
observation at time t and
h t t
Y

is the forecast at time t given the data up to time t-h.

-. 2 0 0
-. 1 5 0
-. 1 0 0
-. 0 5 0
. 0 0 0
. 0 5 0
. 1 0 0
. 1 5 0
2 0 6 0 1 0 0 1 4 0 1 8 0
S e rie s
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SampleACF
-1.00
-.80
-.60
-.40
-.20
.00
.20
.40
.60
.80
1.00
0 5 10 15 20 25 30 35 40
SamplePACF
# of Data Points =
208
Sample
Mean = .0003
Variance = .003775

1.17
12 Tests for white noise
12.1 Time plot
The time series plot should show no patterns. Look for changing variance, outliers
and/or influential points, trends, non-zero mean or any other obvious structure.
12.2 Tests for serial correlation
Sample ACF for Gaussian white noise the sample ACF lies within the bounds n 1
approximately 95% of the time. If substantially more that 5% of the correlations are
outside these limits, or if there are any patterns, or if the autocorrelations are slowly
dying out, or there are several large spikes, particularly if they are separated by the
seasonal lag, the model assumptions may not hold.
Portmanteau test
This consists of sums of the sample autocorrelations of the residuals. There are several
such tests including the Box-Pierce test and the Ljung-Box test.
The Box-Pierce Q statistic is ( )

=
=
h
k
k
r n h Q
1
2
.
If the residuals are white noise then the test statistic has a
2
_ distribution with h m
degrees of freedom where m is the number of parameters estimated. This test has low
power and frequently fails to reject poorly fitting models.
12.3 Difference-sign test
In this test we count the number of times the series increases (that is the number of
positive first differences) which has mean ( ) 1
2
1
n and variance ( ) 1
12
1
+ n n in a
distribution that approaches normality fairly quickly. This is mainly used as a test to
detect trend and is used by ITSM in analysing residuals. This test must be used with
caution as residuals with a strong cyclic component will pass the difference-sign test
since roughly half of the differences will be positive.
12.4 A runs test of randomness
The observations in a random sequence will fall randomly above and below the mean
and should not form a discernible pattern. Two few runs indicate cycling behaviour and
too many runs indicate oscillatory behaviour. The Minitab command
RUNS above and below K for the data in C
counts the number of cases where a sequence of values are continually either above or
below the value K (usually zero). Since the test statistic is asymptotically normal and in
time series we usually have series of more than 30 observations we use the statistic
( )
R R
R Z o = where 1
2
2 1
2 1
+
+
=
n n
n n
R
and
( )
( ) ( ) 1
2 2
2 1
2
2 1
2 1 2 1 2 1 2
+ +

=
n n n n
n n n n n n
R
o
and where n
1
and n
2
are the number of observations respectively above and below the
value K. If n
1
= n
2
these expressions reduce to 1
2
+ =
n
R
and
( )
( ) 1 2
2
2
=
n
n n
R
o

1.18
12.5 Turning points test of randomness
Counting the number of 'peaks' and 'troughs' that appear in the data gives the number of
turning points, T is approximately normally distributed with ( ) 2
3
2
= n
T
and
90
29 16
2

=
n
T
o . ITSM includes this statistic in the set of tests for randomness.
12.6 Phase length test of randomness
The interval between two turning points is called phase. We count the number of times
each phase length occurs and calculate the expected number of occurrences of each
phase length in a random series length n given by
( )( )
( )! 3
1 3 2 2
2
+
+ +
d
d d d n
and since
the usual _
2
test is invalid because the phase lengths are not independent, a modified _
2

test is suggested using 2
1
/
2
degrees of freedom for values > 6.3 and (
6
/
7
) with two
degrees of freedom for lower values. (See Kendall pages 25 and 26 for details.
12.7 Rank tests
There are a number of rank tests which are also used to detect non-randomness. For
instance the rank correlation coefficient, sometimes known as Kendall's t is based on
P, the number of pairs in the series u
t
where u
j
> u
i
. Firstly, P is distributed with mean
( ) 1
4
1
n n and variance
( )( )
8
5 2 1 + n n n
and is asymptotically normal. It can be shown
that
( )
1
1
4
=
n n
P
t has mean zero and variance
( )
( ) 1 9
5 2 2
+
n n
n

It is the statistic P that is used in the ITSM programme.
12.8 Spectral representation of a random time series
A time series can also be thought to consist of the weighted sum of cyclic functions
(sines and cosines for instance). If every frequency were contribute equally to the series
a plot of the contributions by the frequencies would be flat. This plot is called a
spectrum.
For a random series, the spectrum would resemble:
t
Frequency
Power

Figure 26 Model spectrum for a random series
We will see later how to interpret sample specta. For the ln(Beer) differenced at lag 4,
choosing Spectrum, Cumulative periodogram produces

1.19

Figure 27 Cumulative periodogram for
)
And choosing Spectrum, Fishers test shows that we should reject the hypothesis that all
frequencies contribute equally to the variance of the series (so the data in Figure 22 are
not random).

12.9 Test of Normality
Non-normality of the residuals may first be assessed using a histogram

(or stem and leaf plot) and checked more carefully using (for instance in Minitab)
NSCORES of the data in C store in C
PLOT normal scores in C with the data in C
which should be linear. The data are considered to be non-normal if the correlation
measure of linearity
CORR normal scores in C with the data in C
falls below the critical values:

Sample Significance level
size, n 0.01 0.05 0.10
50 0.966 0.976 0.981
60 0.971 0.980 0.984
75 0.976 0.984 0.987
100 0.981 0.986 0.989
150 0.987 0.991 0.992
200 0.990 0.993 0.994
Table 3 Critical values for the normal scores test
.00
.20
.40
.60
.80
1.00
.0 .5 1.0 1.5 2.0 2.5 3.0

1.20
Normality tests available in Excel include the Lilliefors test for instance for the house
price data with histogram:

Figure 28 Histogram of the house price data

Figure 29 Testing normality using the StatPro Add-I ns in Excel
The test for randomness described above are all valid even if the data are not normally
distributed.
13 Self assessment exercises
Use ITSM to duplicate the plots

1.21
Forecasting 1
Week 1: Introduction to time series ....................................................................................... 1
1 Objectives ......................................................................................................................... 1
2 Introduction ..................................................................................................................... 1
3 Classifying forecasting problems ................................................................................... 3
3.1 Subjective methods .................................................................................................. 3
3.2 Univariate methods .................................................................................................. 4
3.3 Multivariate methods ............................................................................................... 4
3.4 Categories of Forecasting Methods and Examples of Their Applications .............. 4
4 Forecasting in practice ................................................................................................. 4
5 Choosing a method .......................................................................................................... 5
6 Some patterns in time series ........................................................................................... 5
7 Plotting Data .................................................................................................................... 7
7.1 Descriptive Statistics ............................................................................................... 7
7.2 Plots and Statistics using ITSM ............................................................................... 8
8 Modelling Approach ....................................................................................................... 8
9 Box-Cox Transformations ............................................................................................ 11
10 Sample autocorrelation and partial autocorrelation functions ................................ 12
11 White noise ..................................................................................................................... 16
12 Tests for white noise ...................................................................................................... 17
12.1 Time plot ................................................................................................................ 17
12.2 Tests for serial correlation ..................................................................................... 17
12.3 Difference-sign test ................................................................................................ 17
12.4 A runs test of randomness...................................................................................... 17
12.5 Turning points test of randomness ......................................................................... 18
12.6 Phase length test of randomness ............................................................................ 18
12.7 Rank tests ............................................................................................................... 18
12.8 Spectral representation of a random time series .................................................... 18
12.9 Test of Normality ................................................................................................... 19
13 Self assessment exercises............................................................................................... 20
14 Questions for submission Week 1 Due: 7 August 2014 . Error! Bookmark not defined.

1.22

1
This section is based on Chatfield, C (1996), The analysis of time series: an introduction,
Chapman & Hall (London)
2
Armstrong, J. S. (1985), Long-Range Forecasting (2nd ed.), Wiley (New York).
3
Wright, G. and P. Ayton (1987), J udgmental forecasting, Wiley (Chichester)
4
Gardner E. S. Jr, (1983), Automatic monitoring of forecast errors, J . Forecasting 2 :1-21.
5
Available as an Excel file, Beer.xls from MyRMIT Studies. Open the file and highlight just
the data (no headings) and transfer the file to the clipboards (Ctr_C). In ITSM, successively
select File, Project, New, Univariate, then select File Import Clipbard and you will see a
graph of the data. At this stage you should save the file as an ITSM object (Select File,
Project, Save as, select the directory you want to save the file to, type Beer in the File Name
box and with the Files of Type selection automatically being ITSM files (*.TSM) press
Save. The file can subsequently be opened directly as an ITSM file, Beer.tsm).
6
Farnum & Stanton, (1989), page 291.
7
Bartlett, M.S. (1946), On the theoretical specification of sampling properties of
autocorrelated time series, J . Roy. Stat. Soc. B, 8: 27-41.

Introduction To Forecasting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Forecasting

Uploaded by

Copyright:

Available Formats

Forecasting

Introduction to time series

denotes a forecast for time n+k

is the forecast at time t given the data up to time t-h.

You might also like