Sta2604 Study Guide

Department of Statistics
STA2604: Forecasting II
Lecture Notes (Study Guide)
2021
i
Open Rubric
Contents
1 An Introduction to Forecasting 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Components of a time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Forecasting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Qualitative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Quantitative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Errors in forecasting and forecast accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Absolute deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.2 Mean absolute deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.3 Squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Mean squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.5 Absolute percentage error (APE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.6 Mean absolute percentage error (MAPE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.7 Forecasting accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Choosing a forecasting technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Factors to consider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Strike the balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 An overview of quantitative forecasting techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Model Building and Residual Analysis 25

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 The variation inflation factor (VIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Comparing regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Basic residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Residual plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Constant variation assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Correct functional form assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.4 Normality assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.5 Independence assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.6 Remedy for violations of assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Outliers and influential observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
ii
2.5.1 Leverage values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Residual magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.3 Studentised residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.4 Cook’s distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.5 Dealing with outliers and influential observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Time series regression 49

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Modeling trend by using polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 No trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 Linear trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.3 Quadratic and higher order polynomial trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Detecting autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.1 Residual plot inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 First-order autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Seasonal variation types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Constant and increasing seasonal variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Use of dummy variables and trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.1 Time series with constant seasonal variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Use of dummy variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 High season and low season . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.4 Use of trigonometry in a model with a linear trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Growth curve models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.7 AR(1) and AR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.8 Use of trend and seasonality in forecast development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Decomposition of a time series 89

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Multiplicative decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.1 Trend analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Seasonal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.3 Determination of the trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.4 Estimation of the cyclical and irregular values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2.5 Obtaining a forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2.6 Prediction interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3 Additive decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Exponential smoothing 102

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Simple exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Tracking signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Holt’s trend corrected exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.5 Holt-Winters methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
iii
5.5.1 Additive Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.2 Multiplicative Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6 Damped trend exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.6.1 Damped trend method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.6.2 Additive Holt-Winters with damped trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.6.3 Multiplicative Holt-Winters with damped trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
iv
Welcome message
Dear Student,
Welcome to the module Forecasting II (STA2604), which is offered in the Department of Statistics. I am Prof G Kabera and I
will be your lecturer for this module. I trust that this module will deepen your understanding of Statistics and help you further
your studies in general. The module will enable you to explore to explore different aspects of time series forecasting methods.
The module enables the student to explore and analyse a wide spectrum of problems on basic time series. More specifically, on
completing the module you should be able to describe the components of a time series model, estimate and interpret them.
The study material for this module is available online only. You will find more details on how to study this module in Tutorial
Letter 101. The different options that are available on this site are shown on the left-hand side of the screen. You will find the study
material for the module in the folder Additional Resources. Your tutorial letters and past examination papers are stored under
Official Study Material. You may be requested to post your answers to certain activities in the Discussion forums tool, and you
may also use this tool to raise issues with me or your fellow students. After reading this page, you should read Tutorial Letter 101
(if you have not done so already). Then you should proceed to your study material. If you have any queries about the module, you
are welcome to contact me by email or telephone. I wish you all the best with your studies.
Prof G Kabera
Tel: +27 11 670 9062
Email: kaberg@unisa.ac.za
Office: GJ Gerwel Building 607, Science Campus, Florida
v
About this module
Prologue
Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace
example might be the estimation of the expected value for some variable of interest at some specified future date. Prediction
is similar, but is a more general term. Both might refer to formal statistical methods employing time series, cross sectional or
longitudinal data, or alternatively to less formal judgemental methods. More will be seen in various parts of this module.
The module is about Forecasting, which deals with the methods used to predict the future, i.e. to forecast. Can you think of a
situation where predictions of the future are needed or cases where forecasting is done? By its nature it is a quantitative method that
uses numeric data. There are various forecasting methods, some of them being qualitative because they are based on non-numeric
data. Even though qualitative methods feature in some of our discussions, they are not dealt with in depth in this module.
This module presents fundamental aspects of Time Series analysis used in forecasting. The prescribed textbook for this module
is Bowerman, O’Connell and Koehler (2005). We will not study all the chapters in the book for this module, but will focus on
Chapters 1, 5, 6, 7 and 8. It is assumed that you are very familiar with the material on simple and multiple linear regression covered
in Chapters 3 and 4. Please revise these two chapters since Chapters 5 to 8 are strongly use linear regression techniques.
The module is done in one year.
About the book

The prescribed book is reader-friendly and contains limited mathematical theory. It is geared towards the practice of forecasting.
The authors are experienced practitioners in the field of time series. The book will assist you in understanding concepts and
methodology, and in applying these in practice (i.e. in real-life situations). The prescribed textbook may change, and in that case
some adjustments may be made to the present study guide.
The computer and the calculator

We recommend that you acquire a non-programmable scientific calculator of your own. It is imperative to have your own
calculator in the examination. It is important, although not compulsory, to have access to a computer in order to undertake the tasks
in this module. You may visit a Regional Centre to use a computer.
This text contains output from Excel, MINITAB, JMP IN and SAS. However, we encourage the use of any software to which
you may have access. In general, the use of Excel is enough. The above list of computer software/packages may be used, as well
as R, SPSS, Stata, S-Plus and EViews. Your ability to use such software will increase your marketability in the workplace. You are
encouraged to experiment with the packages at your disposal.
References
The prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a number of user-friendly
textbooks on Time Series that are available in the Unisa library.
Prescribed book
Bowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression: an applied approach, 4th
edition. Singapore: Thomson Brooks/Cole.
vi
The presentation of the module
This study guide summarises the five chapters of interest in the prescribed textbook.
Prior knowledge
It is important that you are familiar with a section before moving to the next one. This will serve as a foundation for the forthcoming
work. Leaving out work without understanding can only lead to an accumulation of problems when responding to assignments and
writing the examination. This is also true about the prerequisites from first-year statistics and the knowledge you have acquired
through the years. Sensible or smart application is based on the use of the accumulated techniques, experiences and knowledge.
Plotting of graphs, fitting a linear model, and so on, are needed in some instances. You are urged, therefore, to incorporate all
the useful techniques in the solutions to exercises. We advise you to revisit these topics in your first-year modules and in some
second-year modules.
It is necessary to realise that numbers alone do not provide all the answers. It should be clear to you that aspects of a qualitative
nature add value to the predictions made so that the data context is clear.
This study guide

In this study guide we attempt to present explanations of the concepts in the textbook. It contains easy examples as well as activities
for you to practise. You are encouraged to do the activities in order to learn effectively. Reading of feedback alone leaves gaps in
your learning. There are discussions following the activities so that the feedback is immediate. Do not just read through them; try
to explore them by testing that you can do them as well, even if you use alternative methods.
The exercises selected for assignments are important in reinforcing what you need to understand in this module. Take time to
understand the aspects that go with them. Analyse the postulates in the given statements and thereafter the requirements so that it
becomes easy to recall what is necessary in compiling a solution. In that way you do not only solve the problem, you understand
it and enjoy solving it. At the end of the year there is a two-hour examination. The choice of closed- or open-book will depend on
the global heath system, e.g. Covid-19 or endemic/pandemic constraints. The discussions in the study guide and the textbook will
assist you master the module and therefore prepare you for the examination.
This study guide has been prepared to guide you through the prescribed book, but it does not replace the textbook. Therefore,
we will always use it together with the prescribed book. Read them together. The textbook presents the concepts while the study
guide attempts to bring the concepts closer to you. Hence, the prescribed textbook is more important than the study guide.
Each study unit starts with the outcomes in order to show you what you need to know and to evaluate yourself. The table of
outcomes also gives each outcome together with the way the outcome will be assessed, the content needed for that outcome, the
activities that will be used to support the understanding of the content and the way feedback will be given. Your input in the form of
positive criticism to improve the presentation will be of importance in the review of this study guide. You are therefore encouraged
to suggest ways that you believe can improve the presentation of this module.
Module position in the curriculum

We have been offering a postgraduate module on Time Series at Unisa, but have become aware of the need to introduce related
modules at undergraduate level due to its necessity in the workplace and in order to fill the gap that is evident when students attempt
the postgraduate time series module.
This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure is as follows:
vii
1st year STA1501 STA1502 STA1503
STA2604
2nd year STA2601 STA2602 STA2603 FORECASTING II STA2610
We are here
3rd year STA3701 STA3702 STA3703 STA3704 STA3705 STA3710
You should already be familiar with some of the modules mentioned above. Knowledge from STA2604 will help you in
STA3704 (Forecasting III).
Assignments
There are three assignments for this module, which are intended to help you learn through various activities. They also serve as
tests to prepare you for the examination. As you do the assignments, study the reading texts, consult other resources, discuss the
work with fellow students or tutors or do research, you are actively engaged in learning. Looking at the assessment criteria given
for each assignment will help you to better understand what is required of you. The three assignments form part of the learning
process. The typical assignment question is a reflection of a typical examination question.
There are fixed submission dates for the assignments and each assignment is based on specific chapters (or sections) in the
prescribed book. You have to adhere to these dates as assignments are only marked if they are received on or before the due dates.
The three assignments are compulsory as
● they are the sole contributors towards your year mark and
● they form an integrated part of the learning process and indicate the form and nature of the questions you can expect in the
examination.
Please note that the submission of assignment 01 is currently the guarantee for examination entry. If you do not submit
assignment 01, UNISA, not the Department of Statistics will deny you examination entry. UNISA may also require a sub-
minimum of 40% in year mark for examination entry. Once the decision will be implemented it will not be enough to submit
Assignment 01 to be guaranteed examination entry.
You are urged to communicate with your lecturer(s) whenever you encounter difficulties in this module. Do not wait until
the assignment due date or the examination to make contact with lecturers. It is helpful to be ready long in advance. You are
also encouraged to work with your own peers, colleagues, friends, etc. However, you must work on your own when compiling
assignment answers. Sharing answers is pure plagiarism. UNISA and all education institutions do not tolerate plagiarism/cheating.
General details about the assignments will be given Tutorial letter 101. However, the assignments questions will be gradually given
to you throughout the year,
Time series has its own useful terminology that should be understood. In order to familiarise yourself with it, let us start with
an easy activity. Activities help in the creation of a mind map of the module. The more you attempt these activities, the better you
will understand the work.
Glossary terms
ACTIVITY 0.1
(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the prescribed book. They serve as
your glossary. Of course this is a cumbersome task since there are several such concepts in the prescribed textbook.
viii
(b) Attempt to explain the meanings of these concepts before you deal with the various sections so that you have an idea before
we get there.
DISCUSSION OF ACTIVITY 0.1
(a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It appears with other terms
or phrases. The term is “data”. You came across the term many times when you studied other modules and in some other
contexts. It is emphasised that it is a useful aspect in forecasting. If you do not have data, you will not be able to make
forecasts.
(b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook. The intention was to make
you aware of aspects on which to focus in your learning. What is required from you is a step-by-step journey through the
prescribed material.
ACTIVITY 0.2
What is the meaning of the word data?
There is a general misconception that data and information are the same concepts. This is not necessarily the case. Data are
records of occurrences from which we obtain information. It is not necessarily information on its own, but may sometimes be
information. The truth is, data possess information that is seen after some analysis. They are often the raw answers we receive
from an investigation.
What to expect in the module

In this module we use a scientific calculator to perform calculations. We will also draw graphs, form mathematical models (equa-
tions) that are used to develop forecasts and make decisions based on time series data. Most of these aspects stated were taught at
first-year level. The new topic is the pattern of time series data. The way time series data appear is unique because without this
form they cannot qualify to be time series data.
Prerequisites
● The ability to use a scientific calculator.
● Access to a computer package and the ability to use it are highly recommended. The minimum requirement is the ability to
use Excel.
ix
● First-year statistics. The following topics are of great importance in this module:
- Simple linear regression;

- Correlation measures;
- Polynomial models;
- Graph plotting.
We will highlight these topics when we encounter them.
When you draw plots required for statistical analysis, these plots should be accurate. Hence, use a ruler and a lead pencil (not
a pen) to construct plots. If you have access to a computer, you are also encouraged to practise using any statistical package of
your choice. Assignments may also be prepared by means of a computer. Currently, we only recommend using Excel for some
assignment questions. Just make sure that you use the correct notation. Avoid using a computer if you cannot write the correct
notation. Remember that you are always welcome to contact the lecturers whenever you have problems with any aspect of the
module.
Outcomes
At the end of the module you should be able to do the following:
● Apply important concepts and methods in forecasting and detect forecasting errors.
● Build a regression model and perform residual analysis.
● Model the trend of time series data and detect and handle first-order correlation.
● Conduct multiplicative and additive decompositions of a time series.
● Apply exponential smoothing methods.
The assessment, content, activities and feedback for these outcomes are presented in the table on the next page.
x
Table of outcomes
Outcomes - At the
end of the module
Assessment Content Activities Feedback
you should be
able to
- explain and explore - trend
- analyse data - examine data - discuss
time series - seasonality
- plot graphs visually likely
components - cycles
- plot graphs errors
- irregularity
- select a model - balance - choose a statistical - analyse errors - scrutinise
factors technique - plot graphs models
- develop a model - form an - regression - small build-up - emphasise

equation - exponential exercises aptness
smoothing
- estimate parameters - perform - estimation - perform - discuss

estimations methods calculations alternatives
- validate a model - statistical - hypothesis - test hypotheses - peruse the

tests testing various tests
- develop forecasts - demonstrate - model - form equations - visit various

patterns building alternatives
You will know that you understand this module once you will be able to define, describe and apply the concepts in the above
outcomes.
Feedback is not just a follow-up of the preceding concepts. It provides you with an opportunity to reinforce some concepts
and revise others. Make use of this opportunity. Feedback is given after every activity, sometimes with some discussion after the
activity, but in many instances, it follows immediately after the activity.
Difficulties in forecasting terminology

Nearly all futurists describe the past as unchangeable, consisting of a collection of knowable facts. We generally perceive the
existence of only one past. When two people give conflicting stories of the past, we tend to believe that one of them must be lying
or mistaken.
This widely accepted view of the past might not be correct. Historians often interject their own beliefs and biases when they
write about the past. Facts become distorted and altered over time. It may be that the past is a reflection of our current conceptual
reference. In the most extreme viewpoint, the concept of time itself comes into question.
xi
The future, on the other hand, is filled will uncertainty. Facts give way to opinions. The facts of the past provide the raw
materials from which the mind makes estimates of the future. All forecasts are opinions of the future (some more carefully
formulated than others). The act of making a forecast is the expression of an opinion. The future consists of a range of possible
future phenomena or events.
Defining a useful forecast

The usefulness of a forecast is not something that lends itself readily to quantification along any specific dimension (such as
accuracy). It involves complex relationships between many things, including the type of information being forecast, our confidence
in the accuracy of the forecast, the magnitude of our dissatisfaction with the forecast, and the versatility of ways that we can adapt to
or modify the forecast. In other words, the usefulness of a forecast is an application-sensitive construct. Each forecasting situation
must be evaluated individually regarding its usefulness.
One of the first rules is to consider how the forecast results will be used. It is important to consider who the readers of the final
report will be during the initial planning stages of a project. It is wasteful to spend resources on an analysis that has little or no
use. The same rule applies to forecasting. We must strive to develop forecasts that are of maximum usefulness to planners. This
means that each situation must be evaluated individually as to the methodology and type of forecasts that are most appropriate to
the particular application.
Forecasts create the future

Often the way we contemplate the future is an expression of our desire to create that future. Arguments are that the future is
invented, not predicted. The implication is that the future is an expression of our present thoughts. The idea that we create our own
reality is not a new concept. It is easy to imagine how thoughts might translate into actions that affect the future.
Forecasting can, and often does, contribute to the creation of the future, but it is clear that other factors are also operating. A
holographic theory would stress the interconnectedness of all elements in the system. At some level, everything contributes to the
creation of the future. The degree to which a forecast can shape the future (or our perception of the future) has yet to be determined
experimentally and experientially.
Sometimes forecasts become part of a creative process, and sometimes they do not. When two people make mutually exclusive
forecasts, both of them cannot be true. At least one forecast is wrong. Does one person’s forecast create the future, and the other
does not? The mechanisms involved in the construction of the future are not well understood on an individual or social level.
Ethics in forecasting
Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours? Note that the desire for control
is implicit in all forecasts. Decisions made today are based on forecasts, which may or may not come to pass. The forecast is a
way to control today’s decisions.
The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting is that the forecasts will be
used by policy-makers to make decisions. It is therefore important to discuss the ethics of forecasting. Since forecasts can and
often do take on a creative role, no one has the absolute right to make forecasts that involve other people’s futures.
xii
Nearly everyone would agree that we have the right to create our own future. Goal setting is a form of personal forecasting. It
is one way to organize and invent our personal future. Each person has the right to create his/her own future. On the other hand, a
social forecast might alter the course of an entire society. Such power can only be accompanied by equivalent responsibility.
There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting, i.e. the idea that social
forecasting must involve physical, cultural and societal values. However, forecasters cannot leave their own personal biases out of
the forecasting process. Even the most mathematically rigorous techniques involve judgmental inputs that can dramatically alter
the forecast.
Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a socially desirable future for
one person might be another person’s nightmare. For example, modern ecological theory says that we should think of our planet in
terms of sustainable futures. The finite supply of natural resources forces us to reconsider the desirability of unlimited growth. An
optimistic forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the idea of zero growth, is
a catastrophic nightmare for the corporate and financial institutions of the free world. The system of profit depends on continual
growth for the well-being of individuals, groups, and institutions.
‘Desirable futures’ is a subjective concept. It can only be understood relative to other information. The ethics of forecasting
certainly involves the obligation to create desirable futures for the person(s) that might be affected by the forecast. If a goal of
forecasting is to create desirable futures, then the forecaster must ask the ethical question of “desirable for whom?”.
To embrace the idea of liberty is to recognise that each person has the right to create his/her own future. Forecasters can
promote libertarian beliefs by empowering people that might be affected by the forecast. Involving these people in the forecasting
process, gives them the power to become co-creators in their futures.
Now that you have some background on forecasting, let’s start exploring the topic in detail in Unit 1.
xiii
Unit 1
An Introduction to Forecasting
1.1 Introduction
The aim of this unit is to define important concepts and methods in forecasting and detect forecasting errors. The outcomes of the
unit are:
• Identify and compare different forecasting models.
• Select a forecasting method that is appropriate for particular requirements and that is based on relevant time series data.
• Apply forecasting methods correctly.
• Detect errors in forecasting.
Further details on this unit outcomes are given in the following table.
Outcomes - At the
end of the unit
you should be Assessment Content Activities Feedback
able to do
the following:
- discuss each
- define time - data plots - time series - experiment
activity
series terms and measures word list with data
- decompose - graph - time series - plot - critique

time series visual components graphs the graphs
C-statistic
- calculate - stepwise - errors - discuss each

- various calculations
time series exercises in forecasting activity
measures
If you understand the above activities, it will be an indication that you understand this study unit.
Forecasting is the scientific process of estimating some aspects of the future in usually unknown situations. Prediction is a
similar, but is a more general term. Both can refer to estimation of time series, cross-sectional or longitudinal data. Usage can
1
differ between areas of application: for example in hydrology, the terms ”forecast” and ”forecasting” are sometimes reserved for
estimates of values at certain specific future times, while the term ”prediction” is used for more general estimates, such as the
number of times floods will occur over a long period.
It is essential to note that in this module, the emphasis is on scientific forecasting. This is to ensure that we do not consider
subjective predictions. and spiritual prophecies as part of our scope for this forecasting module. Risk and uncertainty are central to
forecasting and prediction. Forecasting is used in the practice of Customer Demand Planning in everyday business forecasting for
manufacturing companies. The discipline of demand planning, also sometimes referred to as supply chain forecasting, embraces
both statistical forecasting and a consensus process.
Forecasting is commonly used in discussion of time-series data. The terms relating to forecasting used in this module are fairly
straightforward and are explained in the prescribed book.
Forecasting has applications in many situations:
● Supply chain management - Forecasting can be used in Supply Chain Management to make sure that the right product is at
the right place at the right time. Accurate forecasting will help retailers reduce excess inventory and therefore increase profit
margins. Accurate forecasting will also help them meet consumer demand.
● Weather forecasting, Flood forecasting, and Meteorology
● Transport planning and Transport forecasting
● Economic forecasting
● Technology forecasting
● Earthquake forecasting
● Land use forecasting
● Product forecasting
● Player and team performance in sports
● Telecommunications forecasting
● Political forecasting
● Sales forecasting
ACTIVITY 1.1
Consider the terms “forecasting”, “cross-sectional data” and “time series”, which are the main focus of this study unit.
(a) Attempt to define these terms without consulting the prescribed textbook or any source such as Google.
(b) Check the definitions in the book and compare your answers in (a).
Before we discuss the above activity, start by reading slowly through the following discussion. Make sure you follow the
discussion.
2
1.1.1 Forecasting
Many people asked about the term “forecasting” make reference to the weather forecast that is presented on radio, television and
the internet. From this we can infer that the general public does not have a clear understanding of the meaning of forecasting.
Historical evidence shows that at every point in time when people lived, they were always interested in the future. There
are stories from history that inform us that when people dreamed, there were experts to explain the meanings of these dreams in
terms of the future. When signs of future drought arose, the implications of the drought were noted and plans were made to offset
the impacts that were anticipated. Drought led to hunger. Thus, when predictions were made so that there was drought coming,
preparations were made that at the time of the drought, there would be enough food for every member of the community during the
duration of the drought. A good example of such an expert is the prisoner Joseph, who according to the Biblical story interpreted
to Pharaoh a dream about a seven-year famine in Egypt and surrounding regions, including Israel where Joseph came from as a
slave. That interpretation resulted in him being promoted from prisoner to prime minister. Predicting the future even as it was done
during the old days can be referred to as forecasting. The predicted future was then used to plan for the future as explained above.
The modern scientific approach has encouraged a more formalised conception of the practice of ”anticipating the future”.
practice be conceptualised. It was then formally termed “forecasting”. The current approaches are scientific in order to ensure
that forecasting is practised systematically. The predictions made are now called forecasts. In other words, forecasts are future
expectations based on scientific guidelines.

The first term listed in Activity 1.1 was “forecasting”. Did you get that? The term forecasting is a “natural” operation. We have
always done it, sometimes unconsciously. As was explained, predicting activities have always been practised, even in ancient times.
For self-evaluation in terms of the time series concept, did you define the term forecasting in line with “predicting the future”?
Forecasting indicates more or less what to expect in the future. Once the future is known, preparation for equitable allocation
of resources can be made. Wastage can thus be reduced or eliminated and gains can be enhanced (or increased).
FURTHER DISCUSSION ON FORECASTING

Forecasting is applied in various real-life situations. Six examples of applications presented in the prescribed book are in the
following fields:
• Marketing department
• Finance
• Personnel management
• Production scheduling
• Process control
• Strategic management
Please read the details regarding the examples. Several examples can be mentioned including our own context at UNISA. The
number of student enrolments at Unisa is the starting point. The trend pattern will give an indication of whether there has been a
decline or growth in the student numbers over the years. If you are observant, you will realise that there has been an increase in
student numbers over the past few years. Our “forecast” for next year (2022) is that there will be more students than in 2021.
ACTIVITY 1.2
Weather forecasting was mentioned as a known example where forecasting is used abundantly. There are many others.
(a) Provide a simple example of a situation where forecasting is needed.

(b) Attempt to explain the details of the example you provided in (a).
3
We discussed the Unisa example. If you are interested in African politics and elections you will be interested in making
predictions about political parties that are going to be in the forefront in the next election. You might anticipate extreme growth
of some political parties and the decline of other parties in a given country, based on the trends in the previous elections and
developments that prevail. Therefore,
(a) one can for example predict how the political parties will perform in the next election; and
(b) recent performance of the various parties in previous elections may be revisited and analysed, the current activities of the
parties may be analysed closely and one may interact with people to determine their impressions about various parties.
N.B.: Here it is assumed normal election conditions where no intimidation, harassments or fraud take place.
1.1.2 Data
Data are important for forecasting. Quality data, which loosely refers to reliable and valid data, are the ones needed for forecasting.
It may be misleading if poor quality data are used because the results may likely to be poor as well, even if best methods are used
by a proficient analyst. The term data refers to groups of information that represent the qualitative or quantitative attributes of a
variable or set of variables. Data (plural of ”datum”, which is seldom used) are typically the results of measurements and can be
the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from
which information and knowledge are derived. Raw data refers to a collection of numbers, characters, images or other outputs
from devices that collect information to convert physical quantities into symbols that are unprocessed.
Without data there will not be forecasting. However, it is important that data be correct (reliable, valid, realistic, etc). Data
need to be both valid for the exercise, and be reliable. If one of these does not apply, then be warned that your forecasts may
mislead you or any user. Also, the collection of data may be inadequate to help in supporting the reasoning behind some findings.
Experience shows that when data are collected under certain contexts, explanations and contexts become clearer when findings are
associated with those contexts. Thus, if you assist in data collection of time series or any statistical data, whenever possible, advise
on the inclusion of details of the occurrences of the data. Giving details around events assists in reducing the extent of making
assumptions which may sometimes be incorrect.
The type of information used in forecasting determines the quality of the forecasts. Not all of us like boxing, but let us discuss
the next scenario. Imagine that two boxers were going to fight on the next Saturday. We were required to make a prediction in
order to win a million rand competition. Many participants looked at the past records of these boxers. They were informed that
in the previous seven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won 22 of the 30
fights he had in the same period. Gumbu was known for winning well while Blood had lost dismally in a recent fight. Let us pause
and enjoy the predictions (forecasts) made, just to make a good point.
ACTIVITY 1.3
Either as a person interested in boxing or someone hoping to win the money, you may be tempted to take a chance at the answer.
Make a prediction of the outcome of the fight based on the explanation given.
4
Let us determine the probabilities as statisticians. Using frequencies, Gumbu had a probability of 0.93 of winning the fight
while Blood had probability of 0.73 of winning the fight. On the basis of these probabilities, many participants predicted that
Gumbu was going to win.
Do you know how the probabilities 0.93 and 0.73 have been obtained? If it is not clear, divide the number of successes (wins)
of each boxer by the total number of fights that each boxer had fought.
The data given were based on certain assumptions. Among others, there was the impression that the opponents of the two
boxers were of the same quality. If they were not, then the prediction would be carrying some “inaccuracies”. Among other
omissions, we were not told that the boxing bout was going to be held in the catchweight division, where boxers came from
different weight divisions and could not both fall within a single previously defined weight division. Blood had fought only world-
class opponents and came from two weight divisions heavier than the weight to which Gumbu belonged. That is, there was a
difference between the original weights of the two boxers. Gumbu, on the other hand, was a boxer who talked too much. He had
fought some mediocre opponents and wanted to pretend he was an excellent boxer. He had asked for the fight. In insisting on
the fight, he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an elimination bout in his
weight division after which he was going to fight for a world title if he won. The planned elimination bout was probably going to
be the first real test for Gumbu as a professional fighter. It was going to come “after I am done with Blood,” boasted Gumbu.
In the street some people were predicting that Gumbu was going to lose, but they did not bet as money was required. None of
those who paid to enter the competition predicted correctly. The fight ended with a first-round knockout. Blood was the winner.
Gumbu was no match.
DISCUSSION OF THE BOXING SCENARIO

The records given were correct, but not complete. Records are past data. We need complete data and the exact context in
which they occurred in order to be able to make accurate forecasts. The analyses that were made about the boxers were correct, but
some assumptions were wrong. Assumptions are used to build cases, and methods are developed on conditions that are given as
assumptions. Wrong assumptions may lead to inappropriate methods for data analysis. In cases where information can be found to
limit the use of assumptions, this should be done. However, many cases provide inadequate information, leaving us with no choice
but to depend on assumptions. Analysis should depend on reasonable assumptions. If in actual practice assumptions are made
for the sake of doing something, decisions and results reached may lead to improper actions. The analyst should learn the art of
making appropriate or reasonable assumptions.
In the case of the example/scenario given, the details were missing, such as that the two boxers were of different weights. If
we knew, this would have helped in our analysis. Sometimes in predicting about forthcoming games, one needs to also know the
quality of opposition that the two opponents have met in the accumulation of their records. This was also missing in the example.
We will insist on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to give inaccurate
predictions.
Types of data that are common in real life are cross-sectional data and time series data. Cross-sectional data refers to data
collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard
to differences in time. Analysis of cross-sectional data usually consists of comparing the differences among the subjects. For
example, we want to measure current obesity levels in a population. We could draw a sample of 1,000 people randomly from that
population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of
that sample is categorized as obese. Even though we may analyse cross-sectional data for quality forecasts, in this module we use
time series data.
5
A time series is a chronological sequence of observations on a particular variable. In general time series data are observed at
equally spaced time points. For instance, time series data can be collected hourly, daily, weekly, monthly, annually, after five years,
ten years, etc. However, equally time points are not a must.
We have to be careful when discussing time series data. If the data are listed without time specification, then we should not
consider the data to be time series.
SCENARIO
Read the following scenario carefully and make notes as we will keep on referring back to it.
Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and VUDEC UNISA campuses.
Very fortunately for Jabulani, his milk cows increased and his market in these campuses also increased from year to year. Jabulani’s
business runs from Mondays to Sundays. In a time series analysis a typical question would be: what can we say about the trend of
the sales? Asked differently: should we believe that the sales have a decreasing or increasing trend? It will be clear later on that
the sales levels differ according to days, high on some days and low on others. The pattern of low sales or high sales on different
days have an important connotation in time series analysis. This will be discussed.
ACTIVITY 1.4
You have done some first-year statistics modules/courses and some of you did mathematics modules as well. Let us consider
the following data sets and look at them quite closely.
Data set 1.1 16 14 19 26 11 24 10

18 15 21 24 12 21 9
21 15 20 27 13 25 11
24 17 24 31 14 27 13
Data set 1.2 16 18 21 24

14 15 15 17
19 21 20 24
26 24 27 31
11 12 13 14
24 21 25 27
10 9 11 13
(a) The two data sets have exactly the same numbers. There is something strange about their appearances though. Compare the
two data sets.
(b) Can these two data sets be classified as time series data sets? Explain.

On whether data are time series or not
When information about the data presented is limited, there also tends to be a limited feedback from an analysis made from
them. You probably realised that the rows of data set 1.1 are the same as the columns of data set 1.2 and vice versa. Or, in short,
that the data sets are transposes of each other. The data in their current form cannot be classified as time series data since no
chronological pattern of the time at which they were collected is given. This will become clearer as we proceed.
Discussion
The data above do not necessarily represent time series data, but it can be presented in another way to form time series data -
provided they were collected chronologically over regular time intervals. Suppose data set 1.1 represents the sales of milk sold by
6
Jabulani from Monday to Sunday for four weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data
sets should therefore be presented as follows:
Data set 1.3 Litres of milk sold by Jabulani
Day
1 2 3 4 5 6 7
1 16 14 19 26 11 24 10
Week 2 18 15 21 24 12 21 9
3 21 15 20 27 13 25 11
4 24 17 24 31 14 27 13
We emphasise that in the initial presentation there was simply no information to explain or demonstrate the chronological
sequence with respect to time and that the data were therefore not time series data.
ACTIVITY 1.5
It is required to use graphs in addition to other methods to detect patterns in time series data. Graphical plots reveal information
visually, but this cannot always be done with ease. The example that follows is one of the cases where we can easily draw graphical
plots. Analyse the data about Jabulani’s business by answering the following questions. Make any comments that you believe are
relevant.
(a) Are they time series data? Justify your answer.
(b) Plot the data to reveal the pattern using the following approaches:
(i) Plot the data for each week separately.

(ii) Plot the data of all the weeks in one graphical display.
(iii) Compare the shapes of the graphs.
(c) Which plot provides us with a better idea of comparison?

The question whether data sets form time series or not depends entirely on the form, which is the chronological order in which
the various data points should be presented. Did you answer ”yes” in question (a)? If not, what did you reveal? How did you reveal
it?
7
(b) Graphs of the activity
(i) Graphs for separate weeks
(ii) Graph for data of all the weeks
(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays and Wednesdays (in
order from highest to lowest). The lowest sales were revealed for Sundays, Fridays, Tuesdays and Mondays (in the
order from lowest to highest).
(c) The graphs can be difficult to compare when they are on separate systems of axes. The last graph makes comparison very
easy, revealing that the patterns for all four weeks are similar.
The patterns of the highest activity and lowest activity about a phenomenon are important in time series. Jabulani will easily
know when he does more business, when he does least business and he can plan to find better ways to improve business. Let us
start formalising these patterns.
1.1.3 Components of a time series

The components of a time series serve as the building blocks of a time series and describe its pattern.
8
Components are important because they enable us to see the salient features of a structure. Through them we can make
descriptions of what we need to analyse. When we deal with something that we can describe, we are better able to know the
requirements for dealing with it. Time series also has components that need to be considered and taken care of in their analyses.
Trend
The first component we discuss is trend. The term “trend” is about long-term decline or growth of an activity. It is defined
formally as the upward and downward movements that characterise a time series over a period of time.
Time series data may show upward trend or downward trend for a period of years. This may be due to factors such as increase
in population, change in technological progress, large scale shift in consumers’ demands, and so on. For example, population
increases over a period of time, price increases over a period of years, production of goods on the capital market of the country
increases over a period of years. These are the examples of upward trend. The sales of a commodity may decrease over a period of
time because of better products coming to the market. This is an example of declining trend or downward trend. The increase or
decrease in the movements of a time series is called trend.
Usually one would not be able to determine from looking at the data whether there is a decreasing or increasing trend. There
are times (but rarely) when we can see the pattern by inspection. Often a graphical plot clearly shows the trend. The trend may be
given in shapes such as linear, exponential, logarithmic, polynomial, power function, quadratic, and other forms. In general, we
use the graphical displays to find out if there is a decline or increase in the activity. Some examples of trend applications are the
following:
- Technological changes in the industry

Currently, companies increase ICT usage in their activities to have competitive edge over those that do not incorporate it.
Institutions of higher learning have aggressively incorporated ICT in facilitating learning, especially the distance education
ones.
- Changes in consumer tastes

Housing is very expensive and scarce, but for obvious reasons remains a priority for households. Recently, cities such as
Cape Town, Durban, East London, Johannesburg, Port Elizabeth and Pretoria have experienced a high influx of people from
other areas, and employment is biased towards the youth. As a result housing in these cities is biased towards townhouses
and flats.
- Increases in total population

There is an increase since there are more births than deaths. In SA, there is also an influx of people from other countries.
In other countries, natural deaths and deaths that resulted from holocausts, wars, terrorism and natural disasters such as the
tsunami and others, have resulted in many deaths but much fewer deaths than the births that have occurred over the years.
That is why there is an increase in the world’s population.
- Market growth
In Gauteng, the market for umbrellas decreases in the period April to July. During the rainy season, which in Gauteng
happens to be the summer season, the sales of umbrellas increase.
- Inflation or deflation (price changes)

If we consider one item for simplicity, maize is produced in the period October to May, approximately. During entry
period, the price of maize is high because there are more people looking for a less available commodity. During the periods
9
November to January, maize is in abundance and the prices drop. As the production level declines, the prices start increasing
again.
Cycle
The next component of time series that we discuss is “cycle”. When trends have been identified, there may be some recurring up
and down movements visible around trend levels. These movements are called cycles. Cycles occur over long and medium terms.
We need to note that generally, natural occurrences have shown some cyclical patterns over the years. Examples are pandemic
diseases that occur after a certain number years (Black plague, Spanish flu, Covid-19, etc.) and thus spikes in number of deaths are
observed.
The impact of cycles on a time series is either to stimulate or depress its activity, but in general, their causes are difficult to
identify and explain. Certain actions by institutions such as government, trade unions, world organisations, and so on, can induce
levels of pessimism and optimism into the economy which are reflected in changes in the time series levels. Economic indices are
usually used to describe cyclical fluctuations.
Cyclical variations are recurrent upward or downward movements in a time series but the period of cycle is greater than a
year. This restriction makes it different from trend. Also, cyclical variations are not as regular as seasonal variations. There
are different types of cycles of varying in length and size. The ups and downs in business activities are the effects of cyclical
variation. A business cycle showing these oscillatory movements has to pass through four phases-prosperity, recession, depression
and recovery. In a business, these four phases are completed by passing one to another in this order. Together, they form a cycle.
Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our capabilities and interest in this
module do not require us to look beyond a decade. Hence, methods for developing forecasts that include cycles (or cyclical
components) are not in the scope of this module. However, you still need to understand when cycles are discussed or implied in a
forecasting situation.
Seasonality
The example about milk above dealt with weekly periods. Generally, Seasonal variations are periodic patterns in a time series
that complete themselves within a calendar year and are the repeated on a yearly basis. The impression it gives is that observations
being investigated, must run over a year. This is simply not the case. Even the values occurring within a day can be seen to be
seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality, which will be used in the
module. The one given above will apply when the periods are over yearly periods. Let us define the concept in the next line:
Seasonal variations are systematic variations that occur within a period and which are tied to some properties of that period.
They are repeated within the period. They are indeed periodic patterns in a time series that complete themselves within a calendar
period and are repeated on the basis of that period.
Seasonal variations are short-term fluctuations in a time series which occur periodically in a period, such as a year. In this
case it would continue to be repeated year after year. The major factors that are responsible for the repetitive pattern of seasonal
variations are weather conditions and customs of people. More woollen clothes are sold in winter than in the season of summer.
Regardless of the trend we can observe that in each year more ice creams are sold in summer and very little in winter season. The
sales in the departmental stores are more during festive seasons that in the normal days.
10
Irregular fluctuations
We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business. Robbery is not a regular or
seasonal event, but can suddenly happen.
Irregular fluctuations are variations in time series that are short in duration, erratic in nature and follow no regularity in the
occurrence pattern. These variations are also referred to as residual variations since by definition they represent what is left out in a
time series after trend, cyclical and seasonal variations have been accounted for. Irregular fluctuations result due to the occurrence
of unforeseen events like floods, earthquakes, wars, famines, and so on.
Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue each morning he left for work.
One Tuesday afternoon after he had counted what he thought was his revenue for the day, he was robbed by two thugs. Fortunately
he was neither hurt nor discouraged to continue with his business. It was happening for the first time. Could he have anticipated
being robbed on that day? We also could not have predicted that event.
The point is, that irregular event changed what could have been the revenue and/or profit for that day. In time series, irregular
fluctuations, which are also called irregular variations, refer to random fluctuations that are attributed to unpredictable occurrences.
The presentation about this concept simply implies that these patterns cannot be accounted for. They are once-off events. Examples
are natural disasters (such as fires, droughts, floods) or man-made disasters (strikes, boycotts, accidents, acts of violence and so
on).
Note that all the components of a time series influence the time series and can occur in any combination. The most important
problem to be solved in forecasting is trying to match the appropriate model to the pattern of the time series data.
ACTIVITY 1.6
Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical variations, and irregular
effects.

You should mention a sequence of observations of a variable presented in chronological form when you describe a time series.
Trend should imply a long-term tendency of that time series. Seasonality should include a periodic pattern in the data. Describing
cycles should imply up and down movements of observations around trend levels. Irregular pattern is the portion of the time series
which cannot be accounted for by the three patterns discussed above.
Exploration data set

The next data set is important for exploration. ENJOY IT. It represents the litres of milk that were demanded from Jabulani.
Whether there was stock or not is not an issue here. This data set will be revisited time and again.
Data set 1.4 Day
1 2 3 4 5 6 7
1 16 14 19 26 11 24 10
Week 2 18 15 21 24 12 21 9
3 21 15 20 27 13 25 11
4 24 17 24 31 14 27 13
In general, methods of forecasting that depend on non-numeric information are qualitative forecasting methods. (Do you re-
member this from first-year Statistics?) Qualitative data are nominal/words data. Quantitative forecasting methods on the other
hand depend on numerical data. The prescribed textbook present further examples of trend, cyclical, seasonal and irregular varia-
tions.
11
1.2 Forecasting Methods
There are several forecasting methods, but there is no single best forecasting method. There are, however, appropriate methods for
any time series situation. The forecasting methods are described in the prescribed textbook along the same line as types of data
that you dealt with in your Statistics courses/modules at first year level. They are qualitative and quantitative in nature.
1.2.1 Qualitative methods

Qualitative forecasting methods become an option to develop forecasts in situations where there are no historical numeric data or
where time series trained statisticians are not available. Opinions of experts are generally used to make predictions in such cases.
Predictions are necessary in all situations, even where there is no data. When this occurs, qualitative methods are involved.
Common examples of qualitative forecasting methods are judgemental methods. Judgmental forecasting methods incorporate
intuitive judgements, opinions and subjective probability estimates. Popular qualitative forecasting methods are the following:
● Composite forecasts
● Surveys
● Delphi method
● Scenario building
● Technology forecasting
● Forecast by analogy
You do not need to learn more about these for the requirements of this module. However, you may come across them in
applications. Hence, your encounter with them may be of help in future applications.
1.2.2 Quantitative methods

Quantitative forecasting methods are used (and only possible) when historical data that occur in numeric form are available. These
methods may occur as univariate forecasting methods or as causal methods.
Univariate forecasting methods depend only on past values of the time series to predict future values. In this method, data
patterns are identified from historical data, the assumption is made that the patterns will continue in the future and then the pattern
is extrapolated in order to develop forecasts.
Causal forecasting models, start by identifying variables that are related to the one to be predicted. This is followed by forming
a statistical model that describes the relationship between these variables and the variable to be forecasted. The common ones are
regression models and ordinary polynomials.
In the causal forecasting method, the variable of interest, which is the one whose forecasts are required, depends on other
variables. It is thus the dependent variable. The ones on which the variable of interest depends are known as the independent
variables.
Discussion about dependence/independence

Note that Jabulani’s customers are mostly people who received wages on a weekly basis. Some are paid on Saturday afternoon,
but an overwhelming majority is paid on Friday afternoon. In addition, on Saturday afternoon, there is an item P that is also liked by
many milk buyers. If item P is available before milk arrives, then this item is bought in large quantities, leaving limited disposable
12
income for the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk before item P was
delivered. However, most of the buyers who are paid on Saturday tend to meet the P seller before their milk purchases on Sunday
morning.
It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail to understand them, you
may fall into the trap of making wrong assumptions because influences that may affect your forecasts and constraints coming with
correlated variables may lead to developing inaccurate models and thus leading to wrong forecasts.
Useful common examples are time series and causal methods. There are others as well, but the following may be of help in
your development.
Time series methods

Time series methods use historical data as the basis of estimating future outcomes.
• Rolling forecast is a projection into the future based on past performances, routinely updated on a regular schedule to
incorporate data.
• Moving average
• Exponential smoothing
• Extrapolation
• Linear prediction
• Trend estimation
• Growth curve
Causal / econometric methods

Some forecasting methods use the assumption that it is possible to identify the underlying factors that might influence the
variable that is being forecasted. For example, sales of umbrellas might be associated with weather conditions. If the causes are
understood, projections of the influencing variables can be made and used in the forecast.
• Regression analysis using linear regression or non-linear regression
• Autoregressive moving average (ARMA)
• Autoregressive integrated moving average (ARIMA), e.g. Box-Jenkins
• Econometrics
Other methods
• Simulation
• Prediction market
• Probabilistic forecasting and ensemble forecasting
• Reference class forecasting
These methods are given to you so that when you make references from other forecasting sources, you will be able to understand
where they belong in your module. However, they are not necessarily required to the extent that is presented in those other sources.
13
ACTIVITY 1.7
● Do you see any dependence of the variables in the example of Jabulani’s milk-selling business above?
Hint: Focus on milk purchases and disposable income.

As suggested by the hint, the purchase of an item that is in high demand depends on the availability of disposable income.
ACTIVITY 1.8
(a) Classify the milk sales in the latest scenario as a dependent or independent variable.
(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable income.
(c) Identify the dependent variable and the independent variable.

Regarding (a), milk sales depend on the availability of disposable income. Hence, (b) milk sales represent the dependent
variable. This leads to (c) that sales are the dependent variable and disposable income is the independent variable.
1.3 Errors in forecasting and forecast accuracy

When it was said that the pattern of information given, such as Jabulani’s milk sales, can help you make future predictions, no one
said your predictions would be perfect.
It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since they are probably going to
mislead the user. When we insist on a scientific method in forecasting, it was to ensure that we can monitor the methods and test
the models so that the inaccuracies in them are reduced, or ideally, eliminated.
It is important to know the likely errors when you attempt to make predictions or develop forecasts. If you know them, you can
avoid or minimise them. Error is as simple as when you thought Jabulani was going to sell 500 litres in a specific week and he ends
up selling 520 litres. (Note that you could make an error in litres of milk by overestimating as well.)
The next sections require your learned skill of drawing graphs and interpreting them. The most common ones you should expect
to encounter (draw and interpret) are scatter diagram (or scatterplot) and time plot. Revise them if you have already forgotten how
they are drawn.
Further, you are soon going to engage in a number of calculations. Thus, ensure that you are ready to perform them, and that
you remember descriptive statistics you learnt in your early years of Statistics. It is also very important to be able to know why the
calculations are necessary in any exercise of building a forecast model.
There are two types of forecasts, the point forecast and the prediction interval. A point forecast is a single number that estimates
the actual observation. A prediction interval is a range of values that gives us some confidence that the actual value is contained in
the interval.
The forecast error requires that the estimate be found and be “paired” with the actual observation.
14
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast value of a time series or
any other phenomenon of interest. In simple cases, a forecast is compared with an outcome at a single time-point and a summary
of forecast errors is constructed over a collection of such time-points. Here the forecast may be assessed using the difference or
using a proportional error. By convention, the error is defined using the value of the outcome minus the value of the forecast. In
other cases, a forecast may consist of predicted values over a number of lead-times; in this case an assessment of forecast error
may need to consider more general ways of assessing the match between the time-profiles of the forecast and the outcome. If a
main application of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing the forecast
is to use the timing-error—the difference in time between when the outcome crosses the threshold and when the forecast does so.
When there is interest in the maximum value being reached, assessment of forecasts can be done using any of:
• the difference of times of the peaks;
• the difference in the peak values in the forecast and outcome;
• the difference between the peak value of the outcome and the value forecast for that time point.
Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to summarize the forecast error
over a group of units. If we observe the average forecast error for a time-series of forecasts for the same product or phenomenon,
then we call this a calendar forecast error or time-series forecast error. If we observe this for multiple products for the same period,
then this is a cross-sectional performance error.
To calculate the forecast errors we subtract the estimates (ŷi ) from the actual observation (yi ). The difference is the forecast
error. Can you tell what the values of the forecast errors imply? For example, some may be smaller than others, some negative and
others positive!
When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In Week 3 prior to getting to
the market, he had made the following estimations (ŷi ):
Week Day Litres of milk estimated (ŷ)

3 1 27
2 11
3 20
4 26
5 14
6 22
7 9
Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi ).
ACTIVITY 1.9
(a) On which days was there overestimation?
(b) On which days was there underestimation?
(c) Calculate the forecast errors for these estimates.
(d) Identify the day on which the milk sales were most disappointing! Explain.
(e) On which day did he make the best prediction? Why?
15
We have not defined the terms overestimation and underestimation formally. They have been defined in other modules, but we
wish to make a reminder. If you make a prediction and the actual observation turns out to be smaller, we will have overestimated.
What is the sign of the forecast error? Can you now define the term “underestimation”? What about the sign of the forecast error?
Let us get into the questions of the activity. The setup of week 3 is as follows:
Actual observations (y1 ) 21 15 20 27 13 25 11

Estimated observations (ŷ1 ) 27 11 20 26 14 22 9
(a) Overestimations are visible after pairing by observing the pairs in which the actual observations are lower than the estimates.
These were on Day 1 and Day 5.
(b) Underestimations occurred on Day 2, Day 4, Day 6 and Day 7.
(c) The forecast errors are −6, 4, 0, 1, −1, 3 and 2 for the seven days, respectively.
(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only sold 21 litres. It is the day he
made the biggest loss, that is with the largest negative error.
(e) He made the best prediction on Day 3, where the sales were equal to the estimates.
If there was no day when the sales and estimates were equal, then the day with the smallest forecast error in absolute value
would have been the one on which the best prediction was made. This means that Day 4 and Day 5 are the days on which good
predictions were made. However, we note that Day 5 was not a happy day for the seller because some stock was left unsold whereas
on Day 4, all stock was sold and one customer did not get milk.
Examining the forecast errors over time provides some information on the accuracy of the estimates.
- Random forecast errors demonstrate that patterns that existed in the data were considered when the estimates were made.
- If there is an increasing (or decreasing) trend, and in making an estimation this trend was not taken care of, then the scatter
plot of forecast errors would reveal an increasing (or decreasing) trend.
- If estimates of seasonal data did not account for seasonality, the scatter plot of forecast errors would reveal the seasonal
pattern that was not taken care of.
- Similar arguments hold for cyclical data.
Deeper discussions and illustrations can be found in the prescribed textbook.
ACTIVITY 1.10
(a) Plot the forecast errors calculated in Activity 1.9.
(b) Do the data reveal any pattern that was not accounted for?
(a) The plot is not difficult to draw. The forecast errors to be used were calculated in Activity 1.9. They are
Forecast errors (e1 ) −6 4 0 1 −1 3 2
16
Plot of forecast errors of Activity 1.9
(b) The plot looks almost random. This means that the forecasting technique provides a good fit to the data.
1.3.1 Absolute deviation

Forecast errors are used to calculate absolute deviations. The absolute deviation requires the forecast errors in absolute terms, i.e.,
a matter of “how far is the estimate from the actual observation”.
ACTIVITY 1.11
Calculate the absolute deviations for the estimates in Activity 1.9.

The calculation is fairly straightforward. We need the forecast errors, which were calculated as
Forecast errors (e1 ) −6 4 0 1 −1 3 2
The absolute deviations are the absolute values of the forecast errors, which we can recall from our high-school days. The
absolute deviations are thus
Absolute deviations (∣e1 ∣) 6 4 0 1 1 3 2
1.3.2 Mean absolute deviation

The absolute deviations give us the mean absolute deviation (MAD) when we obtain their average in the usual way. The MAD
requires the following steps: take the absolute deviations, add them, divide the sum by their number and the result in the MAD.
ACTIVITY 1.12
Calculate the MAD for the estimates in Activity 1.9.

Absolute deviations (∣ei ∣) 6 4 0 1 1 3 2
17
The MAD is therefore
7
∑ ∣ei ∣
i=1
M AD =
n
17
=
7
= 2.42857.
1.3.3 Squared error

Another way to get rid of positive and negative errors is squared errors.
ACTIVITY 1.13
Calculate the squared errors for the estimates in Activity 1.9.
Forecast errors (ei ) −6 4 0 1 −1 3 2
The squared errors are therefore
Squared errors (e2i ) 36 16 0 1 1 9 24
1.3.4 Mean squared error

The MSE is the average of the squared errors.
ACTIVITY 1.14
Calculate the MSE for the estimates in Activity 1.9.
To calculate the MSE we need the squared errors, which were calculated as
Squared errors (e2i ) 36 16 0 1 1 9 24
The MSE is therefore

7
∑ e2i
i=1
M SE =
n
87
=
7
= 12.42857.
18
Now, let us pause a little. We have done a few useful calculations. We have also answered a few questions about errors.
Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also see what is meant by a poor
estimate? Now can you say what is meant by a good estimate? You will recall that the errors need to be as small as possible. So
far it is not absolutely clear what “small” entails.
The MAD and MSE are the measures that we will use to determine if the errors are small which will indicate a good model.
The objective is to select a good forecast model. The model that will be selected must produce forecasts that are close to the actual
observations. The MAD and the MSE will serve as our tools to select a forecast model.
We need to understand the MAD and the MSE as they relate to the forecast model. The steps are as follows:
MAD steps MSE steps

Calculate forecast errors Calculate forecast errors
Determine absolute deviations Determine squared errors
Add the absolute deviations Add the squared errors
Divide by their number Divide by their number
MAD is not in any way “mad”. It is an objective route to good forecasting. The MSE serves the same purpose.
Sometimes the effectiveness of a model is measured in percentages. Such measures are the absolute percentage error (APE)
and the mean absolute percentage error.
1.3.5 Absolute percentage error (APE)

APE is the absolute error divided by the corresponding actual observation multiplied by 100.
ACTIVITY 1.15
Calculate the APE for the estimates in Activity 1.9.
To calculate the APE we need the absolute errors and the actual observations, which are
Absolute deviations (∣ei ∣) 6 4 0 1 1 3 2

Actual observations (yi ) 21 15 20 27 13 25 11
The APEs are therefore
AP Ei 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818
1.3.6 Mean absolute percentage error (MAPE)

MAPE is the mean of the APEs. It is defined as
n
∑ AP Ei
i=1
M AP E = .
n
19
ACTIVITY 1.16
Calculate the MAPEs corresponding to the estimates in Activity 1.11.
To calculate the MAPEs we need the APEs, which are
AP Ei 28.5714 26.6667 0.00 3.7037 7.6923 12.00 18.1818
We obtain
7
∑ AP Ei = 96.8159.
i=1
The MAPE is therefore
96.8159
M AP E = .
7
= 13.8308.
The intention when measuring the error is to reduce it to monitor and control to increase the accuracy of these methods.
1.3.7 Forecasting accuracy

This section summarises the ‘errors in forecasting’ methods presented above and presents them as the level of accuracy achieved.
It is important to know that forecast accuracy starts with the forecast error. As you have seen, the forecast error is the difference
between the actual value and the forecast value for the corresponding period:
et = yt − Ft (1.1)
where et is the forecast error at period t, yt is the actual value at period t, and Ft is the forecast for period t. The summary of
the statistics is given in the next table.
20
Measures of aggregate error (in the summation we omitted the index t = 1, . . . , n but it assumed to be there):
∣et ∣
Mean Absolute Deviation (MAD) MAD =
n
et
∑∣ ∣
yt
Mean Absolute Percentage Error (MAPE) MAPE =
n
∑ ∣e2t ∣
Mean squared error (MSE) MSE =
n
√
∑ e2t
Root Mean squared error (RMSE) RMSE =
n
Please note that business forecasters and practitioners sometimes use different terminology in the industry. They refer to the
PMAD as the MAPE, although they compute this volume weighted MAPE. Please stick to the textbook notation.
1.4 Choosing a forecasting technique

We have to learn various forecasting techniques as well as to choose one of them during forecasting for a number of obvious and
good reasons. If you know a number of techniques and you do not know how to decide on the appropriate one, you may end up
using an inappropriate one to forecast. Also, if you know only one method, you will use it in every forecasting exercise even where
it is not suitable. The measures defined in the previous section will be needed in selecting a model.
In this section we discuss important features in the selection process.
1.4.1 Factors to consider

The following list of factors needs to be considered when a forecasting method is selected:
- Time frame
A forecasting method may take a long or short time to develop. The time frames, or the time horizons, are short, medium, or
long.
- Data patterns
The patterns we identified in the earlier discussion are trend, cycle and seasonality. If a forecasting situation requires a pattern
that the method does not take into account, then the method becomes inappropriate.
- Forecasting cost
Costs could be the money or skills needed to develop a forecasting method. If the cost of developing forecasts is higher
than the benefits, a cheaper method must be used or forecasts should not be developed. Also, the more complex forecasting
methods are more expensive to develop while simple ones are usually less expensive.
- Desired accuracy
Obviously, it is ideal that forecasts be perfectly accurate. Some situations require the best possible accuracy level because of
their high sensitivity. As an example, life-threatening situations such as HIV/AIDS, typhoid, cholera and others, due to risk
of loss of life, require the best possible forecasts with superior accuracy.
21
- Data availability
When there are no numeric data or no detail, we cannot develop quantitative forecasts. Some situations though, may have
limited data, or data of a form that is not required. The forecaster will have to accommodate the data and choose an
appropriate method that will suit the data even though it is not ideal for the problem. We are warned that forecasting methods
give inaccurate forecasts if inaccurate, outdated or irrelevant data are used to develop the forecasts.
- Convenience
Convenience in this case means the ease of use by the forecasters as well as their understanding of the method. If the
forecaster lacks his/her understanding of the methods he or she uses, then there will not be much confidence assigned to the
forecasts.
ACTIVITY 1.17
Suppose that you are to develop forecasts for the number of tourists using the services of a tourism organisation in the country.
You are given data of the number of tourists using these services for the last five years, and they have been increasing annually.
You also realise from the graphs provided that in the months of January, March, June and December the tourists used this company
even more.
(a) As a time series specialist you are requested to develop forecasts and the marketing manager insists on a specific method.
How would you react?
(b) Is the pattern of the data clear? Explain.
(a) One should not hesitate to differ from the marketing manager by refusing to use the method he or she prescribed. When
using the method, the user needs to be able to explain the rationale for it. The marketing manager must give reasons for the
choice, and these reasons must be consistent with the time series methodology. The method must be able to account for the
high tourism numbers in January, March, June and December. It must also be able to show the increasing numbers.
(b) The patterns are clear. The four months with high tourist numbers indicate seasonality while the increasing numbers indicate
an increasing trend.
1.4.2 Strike the balance

The forecasting technique chosen should balance the factors we discussed. The situation will dictate the weight to be given to the
factors.
ACTIVITY 1.18
Develop a forecast model to predict the milk sales of Jabulani’s business (Data set 1.4).
(a) Explain the patterns that exist from the record presented.
Hint: Take note of the seasonality pattern.
(b) If we assume that the display in the past four weeks will recur, can we expect growth in this business? Explain.
22
As per the explanations given, the data period is not enough to warrant the existence of cycles. The irregular component also,
by definition, cannot be accounted for. Hence (a) requires examination of trend and seasonality.
Plots of data set 1.4
Figure (a) Examination of the trend
Here the data for the different weeks were combined so that the trend can be examined. There is an increasing trend that is
demonstrated by the trend line.
Can we determine the rate of increase? Here, the rate of increase is given by the equation of the trend line. You must be able to
show that the equation of the trend line is
y = 0.1571x + 16.365.
Figure (b) Examination of seasonal patterns
The milk sales are clearly high on Day 4 and Day 6 for all the weeks and low on Day 2, Day 5 and Day 7. Therefore, from the
graph, the seasonal pattern is very evident in the data set.
1.5 An overview of quantitative forecasting techniques

Points on regression analysis
Regression analysis is an important topic that requires adequate attention in the discipline of Statistics. Time series also tends
to use regression analysis in model building for some forecasting problems. We discuss some necessary points here, but for full
knowledge on regression analysis, it is better to enrol for a module in Regression Analysis.
23
Regression analysis relates variables through the use of linear equations. It is a statistical methodology that has a wide range of
applications. The variable of interest, denoted by Y , is made the subject of the formula. It is always made the dependent variable
because it is a function of the variables on which it depends. It is also called the response variable because when anything is done
to other variables, the variable of interest behaves in a certain way.
The variables that are related to the response variable are the independent variables that are allowed to vary within their feasible
values. These are the predictor variables often denoted by X1 , X2 , ..., Xk .
The objective of regression is to build a regression model which is a prediction equation that relates y to X1 , X2 , ..., Xk . This
model is used to describe, predict and control Y on the basis of the predictor variables.
Depending on the application needed to address a problem, a regression model can use quantitative independent variables (that
assume numbers) or qualitative independent variables (that assume non-numerical values).
This module requires full manipulation of simple linear regression models and some applications of multiple regression mod-
els. In addition to the regression models, the scope of the module covers time series, decomposition methods and exponential
smoothing.
1.6 Conclusion
We have acquired useful introductory knowledge for the module. We defined forecasting, explained its necessity, and explained
qualitative and quantitative forecasting methods. Time series data were discussed, its components explained, errors in forecasting
were defined, as well as measures to detect them. Factors for choosing a forecasting technique were discussed, and use of regression
analysis in forecasting was discussed briefly. As we know the use of exercises, the next exercises are also intended to make you
“fit” for the tasks ahead.
Self-evaluation exercises
Do exercises 1.1 up to 1.6 on page 25 of Bowerman et al. (2005).

If you encounter any problems with these exercises, do not hesitate to contact your lecturer. Just indicate what is difficult for
you.
You are welcome to discuss your solutions with the lecturer, and you are encouraged to do so by sending these solutions directly
to the lecturer(s) of the module.
24
Unit 2
Model Building and Residual Analysis
2.1 Introduction
In order to do a good job, we need to be well equipped for it. In simple terms we need the knowledge to do the job as well as
the facilities or tools to use. To build a house, we need a good foundation and we need to be able to construct walls. The walls
would normally require bricks, which are laid solidly against one another. They are glued by cement. The cement is mixed with
specific proportions of water and sand. Specific skills are required for an effective mix. A mistake in one of these may lead to bad
results, which may reveal itself only some years after construction. Developing a forecast also requires an amount of knowledge
mix. Fortunately for us, when forecasting is done, there are also some tests or measures to indicate that the forecasts can be trusted.
Good forecasts will represent the actual truth well, with no or minor deviations. On the other hand, bad forecasts would mislead
the forecaster completely.
We need to know the future so that we can plan for it. If you remember the the example of milk sales in unit 1, Thursdays were
good days for business and there was almost always more stock of milk to cater for the increased market. If the predictions were
inaccurate, it could happen that there was less stock when the demand was high.
This study unit focuses on model building and some important aspects of residual analysis. The main purpose of the unit is to
learn to build forecasting models, while residual analysis measures the accuracy of the model. In particular, the unit focuses on:
• Build a regression model by hand or using Excel and interpret the results
• Assess goodness-of-fit, identify outliers and influential points using residual analysis
• Assess multicollinearity using the variance inflation factor.
Outcomes table for the study unit
25
Outcomes - At the
end of the unit
you should be able Assessment Content Activities Feedback
to solve problems on
the following topics:
- multicollinearity - analyse - correlations - calculate - discuss each
covariance - variance - test activity
matrix inflation hypotheses
- comparison of - use selected - R2 , adjusted - perform - explain

2
regression models measures R ,s calculations calculations
C-statistic
- residual analysis - use plots - residual plots - calculations - link with

- assuming - graph the patterns
forms plotting
- diagnostics - compare - leverage - calculate - calculate

measures points measures and discuss
with limits - residuals - plot graphs measures
Where there are concepts that are necessary for us to learn a skill, we will look for the skills wherever they are in the book.
As an example, R2 appears in earlier chapters before Chapter 5. Many of these concepts were dealt with in first-year Statistics.
Fortunately they are all in the fifth chapter of the prescribed book.
We will deal with the following concepts:
● multicollinearity with reference to the variance inflation factor
● R2
● adjusted R2
● the standard error s
● the C-statistic
● stepwise regression and backward elimination – read for interest sake only, not for examination purposes
● residual plots
● the constant variance assumption
● the assumption of correct functional form
● the normality assumption
● the independence assumption
● the leverage values
● all kinds of residuals
26
● Cook’s distance
● outlying and influential values
Some explanations
Time series data in this study unit shall consist predominantly of numeric data collected over regular intervals. Similar to
building a house on a good solid foundation, with intact walls and roof, in forecasting you also need an appropriate framework to
use your data wisely and then develop useful (and not misleading) forecasts.
The four basic steps for this are as follows:

Step 1: Specify a tentative model.
Step 2: Estimate any unknown parameters.
Step 3: Validate the model.
Step 4: Develop the required forecasts.
In forecasting using time series, model building is the foundation. The model is an equation with unknown parameters, and
assumptions. If the parameters are wrong, the model would not provide correct predictions.
In addition, when a statistical analysis of a time series has been completed, we will often find that there exist relationships
between the variables of interest. It is important to know what to do with these relationships, otherwise we may build models that
do not represent the actual pattern of the activity. The next topic explains this aspect of relationships.
2.2 Multicollinearity
We learnt about the correlation coefficient in first-year Statistics. When more than two variables are considered, the correlation
coefficient is generalised to the correlation matrix. We also came across the coefficient of determination when we studied regression.
The correlation coefficient and the coefficient of determination are useful in measuring multicollinearity.
We know from regression analysis that we may express a variable of interest (dependent variable) as a function of other variables
(independent variables). When two independent variables are related, there is collinearity. If more than two independent variables
are related, there is multicollinearity. An extreme case of multicollinearity is singularity, in which an independent variable is
perfectly predicted by another independent variable (or more than one). Do you recall the value of the correlation measure under
perfect correlation? Justify your answer.
ACTIVITY 2.1
Provide an example of a real-life case where multicollinearity can exist.

This seems to be a difficult question at first glance, but no doubt it is very interesting. Let us take an easy example as follows.
Define: Y = productivity of the workforce
X1 = approach used by management in motivating staff
X2 = training received by staff
Surely, Y depends on X1 and X2 . It is put to you that we are to believe that X1 and X2 can be correlated. Obviously motivation
and training of staff are related variables. Thus, one can believe that X1 and X2 are correlated. Do you have any counter reflection
regarding this assertion? Think of other examples. Your examples need not be in the form of mathematical equations. They should
just get you thinking.
27
2.2.1 The variation inflation factor (VIF)
We studied variances in the first year and this gave us an idea of variation. Another topic that we hear about in economics is
inflation. In the current discussion we are not going to discuss economics, just in case you think it refers to that.
The variance inflation factor (VIF) is a measure we will use to determine the extent of multicollinearity. The variance inflation
factor is defined as follows:
Consider a regression model relating a response variable Y to a set of predictor variables X1 , X2 , . . . , Xj−1 , Xj , Xj+1 , . . . ,
Xk . The variance inflation factor V IFj for the predictor variable Xj in this set is defined by
1
V IFj =
1 − Rj2
where Rj2 is the multiple coefficient of determination for the regression model that relates Xj to all the other predictor variables
X1 , X2 , . . . , Xj−1 , Xj+1 , . . . , Xk .
ACTIVITY 2.2
Calculate the VIF for the Wednesday in the data set relating to milk sales in unit 1.
Recall that in Data set 1.4 in Unit 1 we had the following data for Week 3:
Day 1 2 3 4 5 6 7
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
Hint: Recall the multiple coefficient of determination.

We need to have more than one independent variable. In the above case there is only one. The VIF cannot be defined here. Did
the exercise make you think?
ACTIVITY 2.3
Suppose that you are given the following data together with the corresponding estimates.
y 39 41 33 45 29 42 21
ŷ-estimates 36.1 33.9 37.3 40.2 31.7 38.9 34.8
Calculate the coefficient of determination for the data.
We use Excel to perform the calculations. If you have access to a statistical package, you are welcome to use it.
28
These values are given:
yi 39 41 33 45 29 42 21
ŷi 36.1 33.9 37.3 40.2 31.7 38.9 34.8
The sample mean for the actual values (y) is y = 35.7143.
The required squares are
i 1 2 3 4 5 6 7 Sum
2
(yi − y) 10.7958 27.9386 7.3674 86.2242 45.0818 39.5100 216.5106 433.4286
2
(ŷi − y) 0.1488 3.2917 2.5144 20.1215 16.1146 10.1487 0.8359 53.1756
so that the corresponding sums of squares are

2 2
Σ (yi − y) = 433.4286 and Σ (ŷi − y) = 53.1756.
Thus, the coefficient of determination is
n
∑ (ŷi −y)
2
2 i=1
R = n
∑ (yi −y)2
i=1
53.1756
= 433.4286
= 0.1227.
This is how we would calculate the coefficient of determination. The value of R2 is needed for VIF. In calculating VIF though,
only the independent variables are used. We alternate each one of them to be regressed on the others.
NB: Rj2 = 0 implies that Xj is not related to the other independent variables.
ACTIVITY 2.4
What is the value of V IFj when Rj2 = 0?
If we can do our calculation right, it is easy to see that the V IFj = 1.

Therefore, we have discovered that if Xj is regressed on X1 , X2 , ..., Xj−1 , xj+1 , ..., xk , the value V IFj = 1 tells us that
xj is not related to the other independent variables. On the other hand, if Rj2 > 0, there is some correlation between Xj and the
other predictor variables. Here Rj2 > 0 implies that V IFj > 1, meaning that Xj is collinear with the other predictor variables when
V IFj > 1.
What does Rj2 = 1 tell us?

This is a very rare occasion, but possible. It tells that Xj can be explained perfectly by being regressed on the other independent
variables. When this value is attained, the VIF becomes very large, and we write V IFj = ∞.
29
The last case is used to explain the extent of multicollinearity. If the coefficient of determination of one independent variable
on others is very large (i.e., close to 1), the corresponding VIF is very large.
These two situations lead us to the guidelines for interpreting multicollinearity. To decide about the severity of multicollinearity,
we focus on the maximum VIF and the average of the VIFs. In general, multicollinearity between predictor variables is said to be
severe if
● The largest V IF > 10.
● The mean of the V IF s is substantially greater than 1.
This means that if one of the above conditions is met, we can conclude that there is severe multicollinearity between the
independent variable that was regressed on and the others. However, it is not easy to say what “substantially greater than 1” means.
We have to make it definite for the sake of this module.
We rephrase the rule to be:
Consider multicollinearity as severe if one of the following is true:
● The largest V IF > 10.
● The mean V IF > 5.
ACTIVITY 2.5
Consider the “sales territory performance data”presented in the prescribed textbook. The VIFs were found to be:
V IF s 3.34262 1.97762 1.91021 3.23576 1.60173 5.63932 1.81835 1.80856
Determine if we can conclude that there is severe multicollinearity among the independent variables.

We find that the maximum V IF = 5.63932. This value is clearly not larger than 10, and we cannot decide until the second
condition has been checked. Upon calculating the mean, we find that V IF = 2.6667 which is much less than 5. We conclude that
the independent variables are not severely multicollinear.
We have used the coefficient of determination to calculate the VIF in order to test for multicollinearity. You may ask: “Do we
need this measure for other purposes”? Yes, it does have other uses as well.
2.3 Comparing regression models

The model developed must reflect the patterns in the data. It is quite possible to develop more than one model that will reflect the
data patterns. In this case the analyst needs to make a decision on the “best” model. The question is how to choose one of the
models over the others.
The measures that we will use for this purpose are the multiple coefficient of determination R2 , the adjusted multiple coefficient
of determination R2 , the standard error s, the predicted interval length and the C statistic. These values measure how well the
independent variables work together to describe, predict and control the dependent variable accurately. In this module we examine
2
this by assessing if the overall model gives a high R2 and adjusted R2 (denoted by R ), and a small s (or short prediction interval).
30
● The multiple coefficient of determination R2 .
This measure was dealt with to some extent earlier. It is explored further in this section. When we add an independent
variable to a regression model, it decreases the unexplained variation and increases the explained variation, thus increasing
the R2 . This is true even when it is an unimportant independent variable. R2 is calculated as follows:
n
2
∑ (ŷi − y)
Explained Variation i=1
R2 = = n .
Total Variation 2
∑ (yi − y)
i=1
ACTIVITY 2.6
Interpret the value of R2 found in Activity 2.3
It was in Activity 2.3 that R2 = 0.1227. That is the total variation of the response variable explained by the predictor variables
is only 12%. This means that there are other important predictor variables that were not included in the model or the model was
incorrect, thus failed to fit the data.
● The adjusted multiple coefficient of determination R2 .

It has been said above that adding a predictor variable in a regression model increases the value of R2 even if the predictor
variable is unrelated to the response variable. To avoid overestimating the importance of predictor variables, many analysts
recommend calculating the adjusted multiple coefficient of determination given by
k n−1
R¯2 = (R2 − )( )
n − 1 n − (k + 1)
where R2 is the multiple coefficient of determination, n is the number of observations and k is the number of predictor
variables.
ACTIVITY 2.7
How does this measure behave when an additional independent variable is included in the regression model?
This measure depends only on R2 . Its formation is such that it would behave in the same way as R2 . Since we saw in the
2
previous activity that adding any independent variable increases the value of R2 , it will also increase R . Since these two measures
do not seem to provide adequate assistance, let us try s, the standard error.
● The standard error s

Consider the known notation used earlier. The sum of squared forecast errors (SSE) is defined as
2
SSE = Σ (yi − ŷi ) .
31
One criterion considered better than R2 and adjusted R2 for measuring the value of including an additional independent
variable is the standard error given by
√
SSE
s= .
n−k−1
The guideline is that if s increases when we add another independent variable, then that independent variable should not be
added. It is desirable to have a small s. A large s is equivalent to a wide confidence interval. If we were to use the predicted
interval length, short confidence intervals are then indicators of a desired model. We will only use s in this module, but note
that in practice you may be required to use confidence intervals. Note the equivalence.
The next measure for comparing regression models that will be discussed is the C-statistic.
● The C-statistic
The C-statistic, also called the Cp -statistic, is another valuable measure useful in comparing regression models. Let s2p denote
the mean square error based on a model using all p potential independent variables. If SSE denotes the unexplained variation
for another particular model that has k independent variables, then the C-statistic for this model is
SSE
C= − [n − 2(k + 1)].
s2p
ACTIVITY 2.8
Show that the C-statistic may be rewritten as
SSE
C= + 2k + 2 − n.
s2p
This activity is straightforward. Please complete it by yourself.

In the use of the C-statistic, we recall that we want SSE to be small. Thus, we want the C-statistic to be small to trust in the
model. In practice, we should find a model for which the C-statistic is roughly equal to k + 1, the number of parameters in the
regression model.
ACTIVITY 2.9
It says in the description of SSE that we want SSE to be small. Explain why we want this measure to be small.
If one looks at the formula for the measure, it may be written as
2
SSE Σ (yi − ŷi )
s2 = = .
n−k−1 n−k−1
32
In isolation we analyse
2
SSE = Σ (yi − ŷi ) .
This is the sum of the squared differences between the actual values and the estimates. Ideally, if the estimates are perfect
predictions, they will replicate the actual values. Then the differences will be zero. This will therefore result in SSE = 0, the
smallest possible value of SSE. Therefore, if the model used predicts the actual values satisfactorily, then the differences will be
small and SSE will be small.
Look at Example 5.1 (Bowerman et al. 2005: 228).
2
The output from MINITAB and SAS that appears on page 229 resulted from calculating R2 , R , s and the Cp -statistic.
The MINITAB output gives the two best models of each size in terms of s, R2 and the C-statistic. Thus, we find the two best
one-variable models, the two best two-variable models, . . ., the two best eight-variable models. Note that the adjusted R2 increases
considerably when a second variable is added. There is no problem with the inclusion of ACCTS because it is a good predictor of
the dependent variable.
ACTIVITY 2.10
Use the output on p. 229 to answer the following.
(a) If a model with only two variables is to be used, which variables would you use?
(b) A model using five variables is the best. Do you agree? Justify your answer.
(a) The model using ACCTS and ADVERT as predictors explains 77.5% of the variation, R2 = 0.775, more than the model
including MKTPOTEN and MKTSHARE.
(b) The models using five predictors have the smallest C-statistics (4.4) and C is closer to the number of parameters k + 1 = 6.
Discussion
We know that most of the time series models we will develop in future as forecasters will not be 100% accurate.
The error, e = y − ŷ, is the deviation between the actual value and the estimate. In Statistics we use interesting terms, we speak
of a residual when we mean an error estimate.
There are methods to deal with these deviations in Statistics so that our predictions remain useful regardless of the presence of
the errors. We refer to them as residual analysis.
2.4 Basic residual analysis

We defined residuals earlier, and we have also used them in other calculations in this module. Do you remember where?
33
ACTIVITY 2.11
Indicate if the following measures use residuals or not. You may explain in the space provided:
Measure Involves residuals Explanation

Yes No
Unit 1
Forecast error
Absolute deviation
MAD
Squared error
MSE
APE
MAPE
Unit 2
Mean
Standard deviation
VIF
R2
Adjusted R2
Standard error
Mean square error
SSE
C-statistic
This is very interesting. There are links among these measures. Do you see the links? This activity also ensures that we revise
previous work. Can you see how much we have learnt so far?
If you answered ”yes” it is an indication of the importance of residuals. The vehicle we will utilise in this module to show this
importance, is residual analysis.
Residual analysis assists us in the prediction task. It helps us to detect errors in the models we develop, and gives us an
indication of whether we are on the right track.
For this we use graphical plots of residuals. We call them residual plots.
2.4.1 Residual plots

Residuals are calculated for each observed y-value and then plotted against values of the independent variable, the predicted
values ŷi or the time variable. These plots are used to test the assumptions of constant error variance, correct functional form,
independence and normality.
2.4.2 Constant variation assumption

Let us start with an easy activity from a previous scenario.
34
ACTIVITY 2.12
From Unit 1, Data set 1.4 Week 3 was as follows:
Day 1 2 3 4 5 6 7
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
Plot the residuals.
We start by writing the data in the required form.
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
yi − ŷi −6 4 0 1 -1 3 2
That is, the residuals are

Residuals −6 4 0 1 −1 3 2
Residual plot of milk data
Remember that we are using residual plots to test the assumption that e = y − ŷ has a normal distribution with mean 0 and
variance σ 2 . We use the above plot to test the constant variance assumption. If the residuals are randomly distributed around the
zero mean we can assume constant error variance. If, however, the residual plot “fans out” or “funnels in” we have an increasing
or decreasing error variance which implies that the assumption of constant error variance is violated.
Let us share something with you about the residual plot for the milk data.
If you visually place the residual plot in the box below and use lines to explain its shape, it cannot be appropriately explained
by a parallel band of the following form:
35
Also, it does not look like it can be appropriately explained by a fan shape of the form
Instead, it looks very much like it can be appropriately explained by a funnel shape of the form
Thus the residuals for the milk data violate the assumption of constant variance. The prescribed textbook provides further
illustrations on page 238.
36
2.4.3 Correct functional form assumption
The model specified from the given data may be correct or incorrect. Using a residual plot, we can determine whether this functional
form is correct or not. If the functional form is incorrect, a correct one can be found from the residual plot constructed from the
derived model by displaying the pattern of the appropriate model. For example, if we use a simple linear regression model when
the true relationship between Y and X is curved, the residual plot would appear as a curve.
2.4.4 Normality assumption

We remember that a normal distribution exists when we can represent data by a bell shape with symmetry around a central point.
In the current setup, if the normality assumption holds, a histogram and/or stem-and-leaf display of the residuals should assume a
bell shape. Consider the residuals for the milk data used in Activity 2.12.
The above plot shows no evidence of a bell shape. The normality assumption is violated.
We can also employ a normal plot of the residuals to determine normality. The procedure is as follows:
• Calculate the ordered residuals e(1) , e(2) , . . . , e(n) , then place them on the horizontal axis.
• Calculate the ratios

3i − 1
a(i) = , i = 1, . . . , n.
3n − 1
These represent the areas under the standard normal curve from the left side.
• Calculate the z(i) as the value on horizontal axis under the standard normal curve so that the area under the curve to the left
of z(i) is a(i) . Then place them the vertical axis.
• The scatter plot generated by (e(i) , z(i) ), i = 1, . . . , n is the normal probability plot.
• The assumption of normality is not violated when the normal probability plot approximately follows a straight line.
ACTIVITY 2.13
Use a normal plot for the data of Activity 2.12 to determine whether the data come from a normal distribution or not.
37
In Activity 2.12, n = 7 and the residuals are as follows:
ei −6 4 0 1 −1 3 2
The ordered residuals are
e(i) −6 −1 0 1 2 3 4
3i − 1
The a(i) = , i = 1, 2, ..., 7 are
3n + 1
Value 0.0909 0.2273 0.3636 0.5000 0.6364 0.7727 0.90911
The corresponding points from the normal probability table are
z(i) −1.335 −0.748 −0.349 0 0.349 0.748 1.335
For illustration, z(1) = −1.335 was found as follows: The area to the left of z(1) under the standard normal curve is a(1) =
0.0909. Obviously z(1) is negative since the area is less than 0.5 Thus, the area under the standard normal curve between z(1) and
0 is 0.5 − 0.0909 = 0.4091.
∗ ∗
By symmetry of the standard normal curve, let us find z(1) = −z(1) such that P (Z < z(1) ) = 0.4091. The value 0.4091 is not in
Table A1, but this value is between 0.4082 and 0.4099 which correspond to z = 1.33 and z = 1.34, respectively. The mid-value is
1.33+1.34 ∗
thus z = 2
= 1.335. Hence, z(1) = 1.335 which implies that z(1) = −1.335.
The other z-scores are found in a similar way. Please try to find at least two of them. This manual calculation is time consuming,
but one can use Excel with the command =NORM.S.INV(). For example, =NORM.S.INV(0.0909)=-1.335233304, thus when
rounded to four digits after the decimal place, it gives the same result as the one found manually.
38
Therefore, we plot
e(i) −6 −1 0 1 2 3 4
z(i) −1.335 −0.748 −0.349 0 0.349 0.748 1.335
Normal probability plot the milk data
Does your graph give you a straight line?

Comment to conclude the activity.
2.4.5 Independence assumption

In time series an important concept that is often discussed with independence, is autocorrelation. Autocorrelation defines a
pattern of the errors. We say that error terms that occur over time have positive autocorrelation if a positive error term in some
time period tends to produce or be followed by another positive error term in a future period. Graphically, positive autocorrelation
is characterised by a cyclical pattern when residuals are plot against ordered time points.
On the basis of the above, how would you define negative autocorrelation? Since these are time series, the resulting error
terms are also time-based.
In the case where the time-dependent errors do not display a cyclic or alternating pattern, we say that the error terms are
statistically independent.
2.4.6 Remedy for violations of assumptions

A common approach to remedy the problem is to transform the dependent variable. One may raise the dependent variable to
suitable powers. For example, a rapidly increasing error variance which occurs when the dependent variable increases, may be
remedied by a root (such as the square root) or logarithmic transformation.
There may be other reasons for transforming data. Transformation of data may also be done by multiplying by appropriate
factors to restrain it. What example can you think of where multiplication by a factor is needed? We leave the rest for your own
reading.
39
2.5 Outliers and influential observations
Observations that lie far away from the bulk of your data, are called outliers. Some outliers influence the measures derived from
your data. They are called influential observations.
Influential observations have a serious effect on the analysis. To test for the effects caused by a suspected data point, we perform
calculations and estimations, e.g. leverage values, studentised residuals and Cook’s measure. Then we could remove the suspected
data point and perform the same calculations to observe the change in the findings.
Outliers are not necessarily errors, as we may be led to believe. They are often very high or very low values that occur because
of conditions that existed at the time they were observed. Some of them may indicate a fortune while others may be an indication
of a hardship. When high successes are experienced, analysts may examine the factors that contribute to high levels of success. It
is better to take note of the conditions that are necessary to eliminate the outlier!
Be warned also that sometimes low values and high values may occur due to seasonality, not because they are just outliers. Out
of the time series context they may be judged as ‘bad’ or ‘good’ while under the time series scope they may be normal values with
a useful implication.
ACTIVITY 2.14
Are there outliers in the following data set? If so, please identify them.
x 40 36 49 1207 23 38 27 44 45 30
y 90 77 87 46 290 79 58 66 87 66
In the x data set, most values lie in the region of the twenties to the forties. The outlier is therefore x = . The y-values
lie in the forties to the nineties so that the outlier is y = .
ACTIVITY 2.15
Calculate the means and the standard deviations of the data in Activity 2.14.
x y
Mean 153.9 94.6
Standard deviation 370.11 70.09
ACTIVITY 2.16
Remove the values which you said were outliers in Activity 2.14. Calculate the means and standard deviations. Were these data
points influential?
The new data sets are
40
x 40 36 49 38 27 44 45 30
y 90 77 87 79 58 66 87 66
If you did not get the correct answers in Activity 2.14, this is the time to update your answers to that question.
x y
Mean 38.63 76.25
Standard deviation 7.520 11.781
Are there substantial differences from these measures based on the original data? Well, this is obvious. What do you conclude?
ACTIVITY 2.17
Explain if outliers and influential observations are the same.
This is a question given to remove a possible misconception that if a value lies far away from the others, it will also influence
measures calculated from the data set. There are some statistical measures that are easily influenced by outliers, such as the mean
and the standard deviation. But the median and the mode are not influenced that easily. Do you see why?
2.5.1 Leverage values

Leverage values are used to identify outliers with respect to the x-values. The leverage value for an observation xi is the distance
value of that observation.
A leverage value is considered to be large if it is greater than twice the average of all the leverage values which can be calculated
as 2 (k + 1) /n, where k is the number of predictors and n the sample size. Leverages are usually computed using statical software
packages.
2.5.2 Residual magnitude

In order to identify outliers with respect to their y-values, we can use residuals as before. The rule of the thumb is that any residual
that is substantially different from the others is suspect. Before going any deeper, we should experiment with our data and calculate
the residuals.
ACTIVITY 2.18
This activity is included to give you a feeling for the calculations done when analysing residuals. These are unrealistic data,
just to prove the point. In real life this analysis will be done by a computer, but here we use a simple dataset.
Consider the following data:
x 40 36 49 1207 23 38 27 44 45 30
y 90 77 87 46 290 79 58 66 87 66
41
(a) Find the fitted regression equation ŷ = β̂0 + β̂1 x using the method of least squares method.
(b) Calculate the residuals.
(c) Identify residuals that are suspect.
(a) The method of least squares method provides the values of β̂0 and β̂1 as follows:
nΣxy − ΣxΣy
β̂1 = 2
nΣx2 − (Σx)
10 (86194) − (1539) (946)

= 2
10 (1469709) − (1539)
−593954
=
12328569
= −0.048
and
Σy − bΣx
β̂0 =
n
946 − (−0.048) (1539)

=
10
= 102.
The fitted regression equation is therefore
ŷ = 102 − 0.048x.
42
(b) To calculate the residuals, we estimate y-values using the equation above and the following x-values:
x 40 36 49 1207 23 38 27 44 45 30
The estimates of y derived from the above are
ŷ 100.087 100.280 99.654 43.865 100.906 100.184 100.714 99.895 99.846 100.57
The values to be used for calculating the residuals are
y 90 77 87 46 290 79 58 66 87 66
ŷ 100.087 100.280 99.654 43.865 100.906 100.184 100.714 99.895 99.846 100.569
The residuals are
e −10.087 −23.280 −12.654 2.135 189.094 −21.184 −42.714 −33.895 −12.846 −34.569
(c) The residuals that are suspect are the fourth and the fifth ones, namely
e 2.135 189.094
The residual 2.135 is extremely low while 189.094 is extremely high.

Residuals are often not very informative. A best way is to use standardised residuals by dividing them by their standard
deviations. A standardised residual greater than 2 is an indication of y− outlying observation.
2.5.3 Studentised residuals

We also note that we ”should” suspect that the fifth y-value is an outlier. It is only sensible that we should find a way to confirm
it. To identify outliers with respect to y, we can use residuals. A studentised residual is the observation’s residual, divided by its
standard error. If the value is greater than 2 we can assume the observation to be an outlier.
ACTIVITY 2.19
Use the data of Activity 2.14 to calculate the studentised residuals.
First, we need SSE:
2
SSE = Σ (yi − ŷi )
= 41346.9053.
43
Then
√
SSE
s =
n−2
√
41346.9053
=
8
= 71.8913.
Now we want the distances Di so that we can evaluate

√
si = s 1 + Di .
Now
2
1 (xi − x)
Di = +
n SSxx
where
n ∑ xi 1539
2
SSxx = ∑ (xi − x) with x̄ = = = 153.9
i=1 n 10
= 1232856.9.
i 1 2 3 4 5
Di 0.1125 0.1133 0.1107 0.9825 0.1161
i 6 7 8 9 10
Di 0.1129 0.1152 0.1117 0.1115 0.1145
Now we want
√
si = s 1 + Di .
They are
i 1 2 3 4 5
si 75.8274 75.8547 75.7661 101.2239 75.9500
i 6 7 8 9 10
si 75.8411 75.9194 75.8002 75.7933 75.8956
The studentised residuals are then derived from

ei
estud
i = .
si
Recall that the residuals were found to be:
e −10.087 −23.280 −12.654 2.135 189.094 −21.184 −42.714 −33.895 −12.846 −34.569
44
The studentised residuals are
i 1 2 3 4 5
estud
i −0.1330 −0.3069 −0.1670 0.0211 2.4897
i 6 7 8 9 10
estud
i −0.2793 −0.5626 −0.4472 −0.1695 −0.4555
As expected, the fifth observation is an outlier with respect to y since the corresponding studentised residual 2.4897 is greater
than 2.
Sometimes an “obvious” outlier cannot be detected using studentised residuals. Studentised deleted residuals may also be used.
Thereafter we will also use Cook’s distance.
Deleted residuals
The deleted residual for observation i is calculated by subtracting yi from the point estimate computed using least squares point
estimation based on all n observations except for observation i. This is done because if yi is an outlier with respect to its y-value,
using this observation to compute the usual least squares point estimates might draw the usual point prediction ŷi towards yi and
thus cause the resulting usual residual to be small. This would falsely imply that observation i is not an outlier with respect to its
y-value. Studentised deleted residuals are usually computed using statistical software packages.
ACTIVITY 2.20
Inspect the output on p. 256 of the textbook.
2.5.4 Cook’s distance

Cook’s distance (CD) is a useful statistic that is sensitive to both outliers and leverage points. Because of this, it makes an
effective measure for detecting them. There are other measures, but this one is considered the single most representative measure
of influence on overall fit. Cook’s distance (CDi ) measures the change in the regression coefficients that would occur if the ith
observation was omitted. It is defined as
e2i hi
Di = (k+1)s2
[ (1−hi)
2].
where ei = yi − ŷi is the ith residual, s2 is the model standard error and hi the ith diagonal element of the hat matrix. It is
known that a multiple linear regression model is expressed, in matrix form, as
y = Xβ + ϵ
where
y is the vector of responses, X is the design matrix, β is the vector of model parameters and ϵ is the vector of error terms. The hat
matrix is then defined by:
−1
H = X (X ′ X) X′ .
Cook’s distance is compared to F-critical value, say F0.05 (k + 1, n − (k + 1)) to see if it is significant. To guide us further we shall
also use the following rule of the thumb:
45
● A value of Di > 1.0 would generally be considered large.
ACTIVITY 2.21
The table below is about the need for labor in 17 navy hospitals in the USA as described in the textbook on page 254. The
response variable, Y , is the number of monthly hours required, and the dependent variables are
X1 : monthly X-ray exposure
X2 : monthly occupied bed days - a hospital has one occupied bed day if one bed is occupied for an entire day
X3 : average length of patients’ stay, in days.
Hospital Hours (Y ) Xray (X1 ) BedDays (X2 ) Length (X3 )
1 566.52 2463 472.92 4.45
2 696.82 2048 1339.75 6.92
3 1033.15 3940 620.25 4.28
4 1603.62 6505 568.33 3.9
5 1611.37 5723 1497.6 5.5
6 1613.27 11520 1365.83 4.6
7 1854.17 5779 1687 5.62
8 2160.55 5969 1639.92 5.15
9 2305.58 8461 2872.33 6.18
10 3503.93 20106 3655.08 6.15
11 3571.89 13313 2912 5.88
12 3741.4 10771 3921 4.88
13 4026.52 15543 3865.76 5.5
14 10343.81 36194 7684.1 7
15 11732.17 34703 12446.33 10.78
16 15414.94 39204 14098.4 7.05
17 18854.45 86533 15524 6.35
STATA output for fitting model y = β0 + β1 x1 + β2 x2 + β3 x3 + ϵ is the following:
reg y x1 x2 x3
Source | SS df MS Number of obs = 17

-------------+---------------------------------- F(3, 13) = 431.97
Model | 489799064 3 163266355 Prob > F = 0.0000
Residual | 4913457.43 13 377958.264 R-squared = 0.9901
-------------+---------------------------------- Adj R-squared = 0.9878
Total | 494712521 16 30919532.6 Root MSE = 614.78
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0529876 .020092 2.64 0.021 .0095813 .0963938
x2 | .9784801 .1051542 9.31 0.000 .7513084 1.205652
x3 | -320.9485 153.193 -2.10 0.056 -651.9019 10.00497
_cons | 1523.372 786.9016 1.94 0.075 -176.6255 3223.37
------------------------------------------------------------------------------
46
(a) Write down the equation of the fitted model.
(b) The following table presents STATA output for the diagnostic statistics discussed above. Indicate outlying observations in
terms of (1) standardised residuals, (2) studentised residuals, (3) leverages and (4) Cooks distance. Show all steps.
Hosp y x1 x2 x3 Res (e) StaRes StuRes Lev (hi ) Cookd (Di )

1 566.52 2463 472.92 4.45 -121.8827 -0.2114 -0.2035 0.1207 0.0015
2 696.82 2048 1339.75 6.92 -25.0260 -0.0463 -0.0445 0.2261 0.0002
3 1033.15 3940 620.25 4.28 67.7641 0.1182 0.1136 0.1297 0.0005
4 1603.62 6505 568.33 3.9 431.1631 0.7646 0.7517 0.1588 0.0276
5 1611.37 5723 1497.6 5.5 84.5947 0.1438 0.1383 0.0849 0.0005
6 1613.27 11520 1365.83 4.6 -380.5936 -0.6570 -0.6419 0.1120 0.0136
7 1854.17 5779 1687 5.62 177.6171 0.3019 0.2911 0.0841 0.0021
8 2160.55 5969 1639.92 5.15 369.1505 0.6270 0.6118 0.0830 0.0089
9 2305.58 8461 2872.33 6.18 -493.1764 -0.8384 -0.8283 0.0846 0.0162
10 3503.93 20106 3655.08 6.15 -687.4006 -1.1921 -1.2136 0.1203 0.0486
11 3571.89 13313 2912 5.88 380.9370 0.6451 0.6299 0.0773 0.0087
12 3741.4 10771 3921 4.88 -623.0935 -1.1172 -1.1290 0.1771 0.0671
13 4026.52 15543 3865.76 5.5 -337.7909 -0.5681 -0.5527 0.0645 0.0056
14 10343.8 36194 7684.1 7 1630.5050 2.8707 4.5584 0.1465 0.3535
15 11732.2 34703 12446.3 10.78 -348.6927 -1.0054 -1.0059 0.6818 0.5414
16 15414.9 39204 14098.4 7.05 281.9251 0.9901 0.9893 0.7855 0.8973
17 18854.4 86533 15524 6.35 -406.0002 -1.7858 -1.9750 0.8632 5.0328
(a) The fitted model is: ŷ = 1523.372 + 0.0530x1 + 0.9785x2 − 320.9425x3 .
(b) (1) and (2) The observation with standardised and studentised residuals greater than 2 correspond to hospital 14. Therefore,
the outlying observation in its y-value is the one corresponding to hospital 14.
(3) To decide on the observation with high leverage, that outlier in the x-values we must find observations with leverages greater
2(k+1)
than n
where k = 3 and n = 17. We find
2(k + 1) 2(3 + 1) 8
= = ≈ 0.4706.
n 17 17
Therefore, outliers in the x-values are the observations corresponding to hospitals 15, 16 and 17 since the corresponding leverages
0.6818, 0.7855 and 0.8632, respectively, are greater than 0.4706.
(4)For n = 17, k = 3 and α = 0.05, the F critical value is F0.05 (3 + 1, 17 − (3 + 1)) = F0.05 (4, 13) = 3.18. Hospital 17
corresponds to an influential observation since Cooks distance value 5.0329 is greater than the F critical value F0.05 (4, 13) = 3.18.
Obviously, the value 5.0329 is also greater than 1.
2.5.5 Dealing with outliers and influential observations

When we analyse data, we do not want a great influence from only a few elements, since it is important to get information about
the majority. Therefore, these influential observations should first be dealt with.
In practical situations outliers could have important implications. The patterns of time series, such as seasonality, could be the
result of outlying elements in the data. To identify outliers we inspect leverage points and residuals using the techniques studied
above.
47
2.6 Conclusion
The study unit explained model building, and checking the model for usefulness by checking how far it is deviant from real
observations. Some useful statistics were introduced and experimentations took performed to appreciate them. These statistics are
important and should be remembered. You are not required to memorise them. You are also not expected to derive them. However,
you need to be able to interpret computer output on these statistics.
EXERCISES
Consider the values of the pair (X, Y ) given below:
i X Y
1 2 18
2 15 129
3 11 90
4 100 805
5 25 210
6 9 88
Calculate
(a) SSxx
(b) SSxy
(Σxi ) (Σyi )
where SSxy = Σxi yi −
n
(c) Distance values Di
(d) 2(k + 1)/n where k = 1. Why is k = 1?
(e) Which Di ’s are larger than the value for 2(k + 1)/n?
(f) Can you conclude that there are outliers in the data? Explain.
Open questions
(a) Why do we, as forecasters, have to study residuals, outliers, influential observations and the underlying measures?
(b) What is the role of residuals and of deleted residuals? Clarify your answer. Do residuals also explain deleted residuals?
(c) Why do we need to identify influential observations?
Textbook exercises
Exercise 5.4
Exercise 5.5
Exercise 5.7
Exercise 5.16
If you are unsure whether your answers are correct, discuss them with your fellow students in the Discussion forum on the
module website.
48
Unit 3
Time series regression
Outcomes table for the study unit.

This study unit focuses on modelling the trend of time series data, then detects and handles first-order correlation.
• Fit the trend of time series data using polynomials, with emphasis on first order.
• Detect autocorrelation using the Durbin-Watson test.
• Determine the types of seasonal variation and use dummy variables for assessing seasonal effects.
Outcomes
- At the end
Assessment Content Activities Feedback
of the unit you
should be able to
- data plots,
- use polynomial in - model trend - plot graphs, - discuss the
parameter
modeling trend using experiment activity
estimation
polynomial with data and
and
functions interpret data
measures
- detect - Durbin-Watson - autocorrelation - perform - discuss the
autocorrelation test, graphs detection, exercises activities
DW statistic with DW
- regression
- modeling with
- to model of - find lengths of - discuss the
dummy
seasonality using seasonality seasonality, activities
variables
dummy variables using develop
dummy forecasts
variables
3.1 Introduction
This unit is based on Chapter 6 in the prescribed textbook, which is Time Series Regression. It does not require full affluence in
regression. Your basic knowledge of polynomials will suffice. We discussed regression models roughly in the previous study units.
There we stated that the variable of interest (Y), which is the dependent variable, is regressed on the variables (factors) on which it
49
depends. In the past two units we plotted and interpreted some graphs. Did you find them useful? Quadratic equations were also
dealt with at school. Do you remember the parabola? This is the graph of a quadratic equation. You are welcome to refer to school
textbooks for these graphs.
These topics, together with the ones we learnt in study units 1 and 2 such as the components of time series, will be integrated
in this study unit. Do you still remember the components of time series? Attempt to name them.
We defined trend, seasonality, cyclic and irregular patterns in the earlier study units. We will treat trend as it may occur in
a linear pattern, a quadratic pattern and where there is no trend. The linear and quadratic patterns will include decreasing and
increasing trends.
One of the elements we dealt with in the previous study units is independence. Residuals are useful in detecting if the data are
independent or not. Time series data are observations of the same phenomenon recorded over consecutive time periods. Hence,
they cannot be fully independent. The usual relationship in time series data is autocorrelation. When the adjacent residuals have
roughly the same value and being correlated with each other we say that they are autocorrelated.
Autocorrelation can be negative or positive. Positive autocorrelation exists when over time, a positive error term is followed
by another positive error term and if over time, a negative error term is followed by another negative error term. On the other
hand, negative autocorrelation exists when over time, a positive error term is followed by a negative error term and if over time,
a negative error term is followed by a positive error term. We will explore this idea further. Residual plots and the Durbin-Watson
statistic will be involved.
Do you remember that some data do not have a seasonal pattern? Analysing data will reveal the presence or absence of
seasonality and when present, we should be able to determine the pattern.
We will show how dummy variables and trigonometric functions may be used to deal with seasonality. Growth curve models
will also be studied. The unit will also show how to deal with autocorrelated errors using first-order autocorrelated process.
3.2 Modeling trend by using polynomial functions

A time series needs not to have all the components and each component may be analysed separately. We start by assuming that the
time series being dealt with is described fully by using a trend model given by:
yt = T Rt + εt
where
yt = the value of the time series in period t,

T Rt = the trend in time period t,
εt = the error term in time period t.
The value yt can be represented by an average level µt , which changes over time according to the equation, µt = T Rt and by
the error term εt . As we recall that random fluctuations do often occur in a process, the error term represents random fluctuations
that cause yt values to deviate from the average level µt . The three trends that we are going to study in this module are no trend,
linear trend, and quadratic trend.
50
ACTIVITY 3.1
What do you think “no trend” means?

We said that trend describes long term growth or decline. Thus, ”no trend” means there is no growth or decline. Hence we
anticipate a constant process.
3.2.1 No trend
In qualitative terms one may describe the condition as stable. This is a case of no deterioration and no improvement, therefore a
case of no trend. There is a general constant pattern displayed with no long-run growth or decline over time. In this case the trend
takes some constant value β0 , and is modeled as T Rt = β0 . Generally the case of “no trend” is undesirable, but it may happen.
Who would not want to see change?
Note that the case of “no trend” does not necessarily mean absolutely no change. If the changes are shown by fluctuations (the
ups and downs) in such a way that the average seems constant in the long run, then we have no trend.
3.2.2 Linear trend

A linear trend is modelled as T Rt = β0 + β1 t where β0 is the intercept and β1 is the slope. Thus a time series with linear trend can
be modelled as
yt = T Rt + εt
= β0 + β 1 t + ε t
The values β0 and β1 of the above equation provide us with the shape of the line graph. Try to recall the values that lead to
various shapes.
ACTIVITY 3.2
Discuss the implications of the parameters β0 and β1 on the shape of the linear graph.

The intercept on the vertical axis can provide some information about the history of a time series. In case of the slope, which is
either increasing or decreasing, the sign explains the trend. A negative slope shows a decline, (β1 < 0). Can you explain the cases
β1 > 0 and β1 = 0? There is an obvious relationship between the case of β1 = 0 and the case we discussed of no trend. Do you see
it?
3.2.3 Quadratic and higher order polynomial trend

We recall that the quadratic equation has the form y = a+bx+cx2 . Therefore, the equation for the trend is T Rt = β0 +β1 t+β2 t2 . The
highest exponent determines the overall shape, which is an indicator of the behaviour of the dependent variable. The knowledge
we acquired in the school years also becomes handy here!
The quadratic trend may show either an increase or a decrease in the dependent variable. Graphical illustrations of different
scenarios are presented on page 281 in the prescribed textbook.
In general, the pth-order polynomial trend is given by:
51
yt = T Rt + εt
= β0 + β1 t + β2 t2 + ... + βp tp + εt
ACTIVITY 3.3
Write down the equation for the 3rd -order polynomial trend model.

In this case p = 3, and thus yt = β0 + β1 t + β2 t2 + β3 t3 + ϵt .
The estimation of the regression parameters β0 , β1 , β2 and β3 is done using the method of least squares. The assumptions in
the model are that the error term εt satisfies the constant variance, independence, and normality assumptions.
ACTIVITY 3.4
How would you identify the violations of the assumptions?

We know that the behaviour of the residuals indicates what we missed in the estimation. A horizontal band of the residual plot
confirms a constant variance assumption. Fanning out indicates increasing variance and fuelling in shows a decrease in variance.
The normality assumption can be checked using normal plots. Apart from these, histograms and stem-and-leaf diagrams can reveal
the normality pattern as well.
ACTIVITY 3.5
The data Cod Catch described in the prescribed textbook in Example 6.1 is reproduced below.
Month Year 1 Year 2

January 362 276
February 381 334
March 317 394
April 297 334
May 399 384
June 402 314
July 375 344
August 349 337
September 386 345
October 328 362
November 389 314
December 343 365
The company wanted to forecast its minimum and maximum possible revenues from cod catch sales and to plan the operations
of its fish processing plant by making point and interval forecasts of its monthly cod catch (in tons).
(a) Plot the cod catch versus time.
52
(b) Which type of trend the plot indicate? Explain your answer.
(c) Determine the point estimate and the 95% prediction interval for the monthly cod catches.
(a) We must first combine in long format the data of the two years, and thus have 24 data points as in the table below.
Time (t) Cod Catch (yt )

1 362
2 381
3 317
4 297
5 399
6 402
7 375
8 349
9 386
10 328
11 389
12 343
13 276
14 334
15 394
16 334
17 384
18 314
19 344
20 337
21 345
22 362
23 314
24 365
The plot of Cod Catch versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
53
Try it. You can also use another statistical package.
(b) The plot of the data reveals a random fluctuation around a constant average level. The maximum is slightly above 400 while
the minimum seems to be half way between 250 and 300, that is about 275.
Now, since we assume a random fluctuation around a constant level, it makes sense to believe that a model with no trend
describes the data. Hence, it comes to the conclusion that the regression model with no trend is to be used in forecasting the
cod catch in future months. Therefore, we use the following model:
yt = T Rt + εt = β0 + εt
(c) The parameter is a constant. What is it estimated? It is well known that in this case, least squares estimation gives β̂0 = ȳ,
that is:
24
∑ yt 362 + 381 + ⋯ + 365
β̂0 = ȳ = t=1 = = 351.2917.
24 24
When there is mention of minimum and maximum, we must remember from first year that it is the confidence intervals
that are being implied. Now, do you recall interval estimation? In forecasting we speak of forecasts when point estimates
of future predictions are of interest, and of prediction interval forecasts for confidence intervals of the predicted future
confidence intervals.
The 100(1 − α)% prediction interval of β̂0 = ȳ is :

√ √
⎛ [n−1] 1 [n−1] 1⎞
y − tα s 1 + , y − tα s 1 + .
⎝ 2 n 2 n⎠
Do you remember the formula?
Since n = 24 and 1 − α = 0.95, that is α = 0.05, the t-table gives

[n−1]
tα = t23
0.025 = 2.069.
2
Also,
¿
Á n (y − y)2
Á∑ t
Á
À t=1
s= .
n−1
54
The results in the following tables will be used to calculate s.
t yt (yt − ȳ)2
1 362 114.6677
2 381 882.5831
3 317 1175.9207
4 297 2947.5887
5 399 2276.0819
6 402 2571.3317
7 375 562.0835
8 349 5.2519
9 386 1204.6661
10 328 542.5033
11 389 1421.9159
12 343 68.7523
13 276 5668.8401
14 334 299.0029
15 394 1823.9989
16 334 299.0029
17 384 1069.8329
18 314 1390.6709
19 344 53.1689
20 337 204.2527
21 345 39.5855
22 362 114.6677
23 314 1390.6709
24 365 187.9175
Total 351.2917 26314.9583
¿
Á n (y − y)2 √
Á∑ t
Á
À t=1 26314.96
Hence, s = = = 33.82497
n−1 23
We are left with the final calculation of the prediction interval.
The 95% prediction interval is:
√ √
⎛ [23] 1 [23] 1⎞
y − t0.025 s 1 + , y + t0.025 s 1 +
⎝ n n⎠
√ √
⎛ 1 1⎞
= 351.2917 − 2.069 (33.82497) 1 + ; 351.2917 + 2.069 (33.82497) 1+
⎝ 24 24 ⎠
= (279.8647; 422.7187)
55
ACTIVITY 3.6
The demand of new type of calculator, called Bismark X-12, has been increasing over the last two in Smith’s Department
Stores, Inc as stated in the Example 6.2 of the prescribed textbook. Smith’s uses an inventory policy to meet customers’
demand without ordering calculators that may greatly exceed the demand. In order to implement his policy in future months,
Smith both point predictions and prediction intervals for total monthly Bismark X-12 demand. The monthly calculator
demand data for the past two years are given in the following table:
Month Year 1 Year 2

January 197 296
February 211 276
March 203 305
April 247 308
May 239 356
June 269 393
July 308 363
August 262 386
September 258 443
October 256 308
November 261 358
December 288 384
(a) Plot demand versus time.
(c) Determine the point estimate and the 95% prediction interval of calculator demand for January of the third year.
56
Time (t) Demand (yt )

1 197
2 211
3 203
4 247
5 239
6 269
7 308
8 262
9 258
10 256
11 261
12 288
13 296
14 276
15 305
16 308
17 356
18 393
19 363
20 386
21 443
22 308
23 358
24 384
The plot of Demand versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
57
(b) The figure gives an indication of an increasing trend. Therefore, we shall employ the regression equation of the form:
yt = T Rt + εt = β0 + β1 t + εt
(c) The method of least squares can be used to estimate the parameters β1 and β0 , that is:
nΣtyt − ΣtΣyt Σyt − β̂1 Σt

β̂1 = 2
and β̂0 = .
nΣt2 − (Σt) n
The following table contains the required values to estimating β1 and β0 .
Time (t) Demand (yt ) t2 tyt (t − t̄)2 ŷt (y − ŷ)2

1 197 1 197 132.25 206.1033 82.87007
2 211 4 422 110.25 214.1776 10.09714
3 203 9 609 90.25 222.2519 370.6357
4 247 16 988 72.25 230.3262 278.0156
5 239 25 1195 56.25 238.4005 0.3594
6 269 36 1614 42.25 246.4748 507.3846
7 308 49 2156 30.25 254.5491 2856.999
8 262 64 2096 20.25 262.6234 0.388628
9 258 81 2322 12.25 270.6977 161.2316
10 256 100 2560 6.25 278.772 518.564
11 261 121 2871 2.25 286.8463 668.0312
12 288 144 3456 0.25 294.9206 47.8947
13 296 169 3848 0.25 302.9949 48.92863
14 276 196 3864 2.25 311.0692 1229.849
15 305 225 4575 6.25 319.1435 200.0386
16 308 256 4928 12.25 327.2178 369.3238
17 356 289 6052 20.25 335.2921 428.8171
18 393 324 7074 30.25 343.3664 2463.494
19 363 361 6897 42.25 351.4407 133.6174
20 386 400 7720 56.25 359.515 701.4552
21 443 441 9303 72.25 367.5893 5686.774
22 308 484 6776 90.25 375.6636 4578.363
23 358 529 8234 110.25 383.7379 662.4395
24 384 576 9216 132.25 391.8122 61.03047
300 7175 4900 98973 1150 22066.6
Hence,
nΣtyt − ΣtΣyt 24 (98973) − (300) (7175)
β̂1 = 2
= = 8.0743
nΣt2 − (Σt) 24 (4900) − 3002
and
Σyt − b1 Σt 7175 − (8.0743) (300)
β̂0 = = = 198.0290.
n 24
The fitted regression equation is:

ŷt = 198.0290 + 8.0743t
58
which can be used to calculate the point forecast of a future demand yt . January of the third year corresponds to t = 25.
Hence, the point forecast of the demand in January of the third year is:
ŷ25 = 198.0290 + 8.0743(25) = 399.8865. Furthermore, the 95% prediction interval for y25 is:
¿
(24−2) Á 1 (25 − t̄)2
ŷ25 ± t0.025 sÁ
À1 + + 24
24 ∑t=1 (t − t̄)2
(24−2) (22)
where t0.025 = t0.025 = 2.074 and
√ √
(yt − ŷ)2 22066.6
s= = = 31.6706
n−2 22
The 95% prediction interval of y25 is:
¿ √
(24−2) Á
Á
À 1 (25 − t̄)2 1 (25 − 12.5)2
ŷ25 ± t0.025 s 1 + + 24 = 399.8865 ± 2.074(31.6706) 1 + + .
24 ∑t=1 (t − t̄) 2 24 1150
Hence, the 95% prediction interval of y25 is: [328.6090; 471.1640].
ACTIVITY 3.7
Activities 3.5 and 3.6 focused on time series data with no trend and with a linear trend, respectively. In this activity we focus
on time series with a quadratic trend using Example 6.3 in the prescribed textbook. The example is on data collected in two
consecutive years for monthly loan requests by staff members at State University Credit Union. The credit union requires both
point predictions and prediction intervals of monthly loan requests to made by staff members in future months. The collected data,
thousands of dollars, are reported in the following table:
Month Year 1 Year 2

January 297 808
February 249 809
March 340 867
April 406 855
May 464 965
June 481 921
July 549 956
August 553 990
September 556 1019
October 642 1021
November 670 1033
December 712 1127
(a) Plot loan request versus time.
(c) Determine the point estimate and the 95% prediction interval of loan request for January of the third year.
59
Time (t) Loan request (yt )

1 297
2 249
3 340
4 406
5 464
6 481
7 549
8 553
9 556
10 642
11 670
12 712
13 808
14 809
15 867
16 855
17 965
18 921
19 956
20 990
21 1019
22 1021
23 1033
24 1127
The plot of Loan request versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
60
(b) The above graph indicates an increasing trend with a decreasing rate. Thus the following quadratic model may be used to
model the data:
yt = T Rt + ϵt = β0 + β1 t + β2 t2 + ϵt .
(c) Parameter estimation can be done using any statistical package, but one can also Excel provided the values of t2 are first
61
created as in the following table:
yt t t2
297 1 1
249 2 4
340 3 9
406 4 16
464 5 25
481 6 36
549 7 49
553 8 64
556 9 81
642 10 100
670 11 121
712 12 144
808 13 169
809 14 196
867 15 225
855 16 256
965 17 289
921 18 324
956 19 361
990 20 400
1019 21 441
1021 22 484
1033 23 529
1127 24 576
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9935
R Square 0.9871
Adjusted R Square 0.9859
Standard Error 31.2469
Observations 24
ANOVA
df SS MS F Significance F
Regression 2 1566730.1527 783365.0764 802.3275 0.0000
Residual 21 20503.6806 976.3657
Total 23 1587233.8333
62
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 199.6196 20.8480 9.5750 0.0000 156.2639 242.9753
X Variable 1 50.9366 3.8424 13.2564 0.0000 42.9459 58.9274
X Variable 2 -0.5677 0.1492 -3.8048 0.0010 -0.8780 -0.2574
Thus, the fitted model is: ŷt = 199.6196 + 50.9366t − 0.5677t2 .
ACTIVITY 3.8
Determine forecasts for the loan requests for April of year 7.
We note that December of year 6 is t = 72. Can you see why? It then becomes easy to realise that April of year 7 is t = 76.
Thus, we are required to determine the value of ŷ76 . Thus:
ŷt = 199.62 + 50.937t − 0.5677t2

2
= 199.62 + 50.937 (76) − 0.5677 (76)
= 791.80
3.3 Detecting autocorrelation

Correlation analysis measures the strength of association between variables. The prefix “auto-“, explains that the strength of the
association is examined for the same variable. Autocorrelation is studied on the same variable with its observations collected over
time. We recall that correlation can be negative or positive. In fact it lies between −1 and 1, −1 ≤ r ≤ 1.
First, being the observations from the same variable, it is common for the time-ordered error terms to be autocorrelated. When
this happens, it violates the regression assumption that error terms need to be independent. Interestingly, there is an easy way to
determine if the error terms are autocorrelated and to determine the direction (negative or positive) of the autocorrelation. The
pattern of autocorrelation has been discussed earlier.
3.3.1 Residual plot inspection

You may encounter residual plot inspection at various sections as you study. It is therefore an important topic.
ACTIVITY 3.9
Consider the following residual plots of time series data. State in each case if the error terms are negatively autocorrelated,
positively autocorrelated or there is no autocorrelation. The space in “Verdict” below the following graphs allows you to fill in the
answer.
63
Residual plot (a)
Verdict: The residual plot above shows a autocorrelation.
Residual plot (b)
64
Residual plot (c)
You have made the verdicts by deciding the appropriate pattern for each graph given. Are you happy with your answers?
Residual plot (a) is fully characterised by the “Positive autocorrelation clarification phrase”. Residual plot (b) cannot be related
to any of the two phrases; hence it is an example of a case where there is no autocorrelation. Lastly, Residual plot (c) is fully
characterised by the “Negative autocorrelation clarification phrase”. To convince ourselves even more, we read off the values from
these graphs. The three residual data sets used are:
residuals (a) −2 −7 −4 3 9 14 4 −1 −5 −3 −1 2 5 3
residuals (b) −2 −7 4 −3 9 0 −4 −1 5 −3 1 −1 5 3
residuals (c) −2 7 −4 3 −9 14 −4 1 −5 3 −1 2 −5 3
These residuals confirm the verdicts in the above discussion. An alternative way is to use runs. A run is simply a set of same
signs following each other. If you can identify that signs of residuals that follow each other appear as runs then we have a positive
autocorrelation. If the signs alternate, we have a negative autocorrelation. Where none of these patterns appear, then there is a
random pattern. This is the case where the assumption of independent errors is confirmed. The two cases of autocorrelation are
undesirable since they violate the assumption. In the examples above, we have the following runs:
residuals (a) − − − + + + + − − − − + + +
residuals (b) − − + − + + − − + − + − + +
residuals (c) − + − + − + − + − + − + − +
In the following subsections, we will give a formula to calculate so that you do not rely only on visual inspection.
65
3.3.2 First-order autocorrelation
One type of positive or negative autocorrelation is the first-order autocorrelation, denoted as AR(1) where AR stands for
autoregressive and 1 represents the lag. In this case, the residuals are related to their immediate predecessors. That is the error term
in period t (εt ) is related to the error term in period t − 1, namely; εt−1 . The first-order autocorrelation AR(1) is represented by
the equation εt = ϕ1 εt−1 + at . Here we assume that:
● ϕ1 is the correlation coefficient between error terms separated by one time period; and
● a1 , a2 , ... are values randomly and independently selected from a normal distribution having mean zero and a variance
independent of time.
We promised to show how to determine negative or positive autocorrelation. The Durbin-Watson test will assist in achieving
this. This test can be one-sided (one-tailed) or two-sided (two-tailed). It is important to note the meaning given by each version
of a one-sided test. The Durbin-Watson (DW) statistic is used for all the three versions. The DW statistic will not be used if the
residuals are less than 15 in number or more than 100. We also need the number of predictor variables (k). Here the number of
predictor variables is the power (k) of the polynomial from which the residuals were derived. In simple linear regression, k = 1.
Let e1 , e2 , ..., en be the time-ordered residuals. The DW statistic is:
n
2
∑ (ei − ei−1 )
i=2
d= n
∑ e2i
i=1
Positive autocorrelation is the first of the three versions that we look at in the use of the DW statistic.
Durbin-Watson test for positive autocorrelation
This version is a one-sided test for positive autocorrelation. It is formulated in clearer detail as follows:
Positive autocorrelation test

1. H0 ∶ The error terms are not autocorrelated.
vs
Ha ∶ The error terms are positively autocorrelated.
2. Calculate d.
3. Set α, the level of significance (usually 1% or 5%).
4. Read off the value from the table on pages 598 to 599 in the prescribed textbook.
5. Make a decision based on the rule:
5.1 Reject H0 if d < dL,α
5.2 Do not reject H0 if d > dU,α
5.3 We are unable to reach a decision if dL,α ≤ d ≤ dU,α
An easy illustration follows:
ACTIVITY 3.10
Use the DW test to determine if the following residuals are positively AR(1). Assume that the model for the residuals was of
the fourth power.
66
Error terms: −2 − 7 − 4 3 9 14 4 − 1 − 5 − 3 − 1 2 5 3 − 1 − 4
2
ei e2i ei−1 (ei − ei−1 )
−2 4 − −
−7 49 −2 25
−4 16 −7 9
3 9 −4 49
9 81 3 36
14 196 9 25
4 16 14 100
−1 1 4 25
−5 25 −1 16
−3 9 −5 4
−1 1 −3 4
2 4 −1 9
5 25 2 9
3 9 5 4
−1 1 3 16
−4 16 −1 9
Total 462 340

vs
Ha ∶ The error terms are positively autocorrelated.
2
Σ (ei − ei−1 ) 340
2. d= = = 0.7359
Σe2i 462
3. We choose α = 0.01.
4. Since we assume that the model used was of the fourth power, and using α = 0.01,
from the DW table, we read off the values
corresponding to row n = 16 and column k = 4 and . These values are
dL,0.01 = 0.53 and dU,0.01 = 1.66.
5. We are unable to reach a decision since dL,α ≤ d ≤ dU,α
While we are discussing this activity, one realises that the choice can be an important factor. To illustrate this point, suppose
in the above activity we chose α = 0.05. We would therefore have dL,0.05 = 0.74 and dU,0.05 = 1.93. The decision is to reject H0
since d < dL,α . Interesting! What do you think?
In order to address the activity fully, the decision reached implies that at the 5% level of significance we conclude that the error
terms are positively autocorrelated.
67
Durbin-Watson test for negative autocorrelation
This version is a one-sided test to test for negative autocorrelation. It is formulated in similar details as for positive correlation as
follows:
Negative autocorrelation test

vs
Ha ∶ The error terms are negatively autocorrelated.
2. Calculate d.
3. Set α, the level of significance (usually 1% or 5%).
5.1 Reject H0 if (4 − d) < dL,α
5.2 Do not reject H0 if (4 − d) > dU,α
5.3 We are unable to reach a decision if dL,α ≤ (4 − d) ≤ dU,α
An easy illustration follows.
ACTIVITY 3.11
Use the DW test to determine if the following residuals are negative AR(1).For argument’s sake, assume that the model used
from where these residuals were derived had a quadratic equation. Use α = 0.05.
Error terms: − 2 − 7 4 − 3 9 0 − 4 − 1 5 − 3 1 − 1 5 3 − 4 9 − 4

vs
Ha ∶ The error terms are negatively autocorrelated.
2
Σ (ei − ei−1 ) 992
2. d= 2
= = 2.7632.
Σei 359
3. We were instructed to use α = 0.05.
4. With k = 2, n = 17 and α = 0.05, then dL,0.05 = 1.02 and dU,0.05 = 1.54.
5. Now, (4 − d) = 1.2368. We are unable to reach a decision if dL,α ≤ (4 − d) ≤ dU,α .
When you are required to test any hypothesis, show the steps you follow. This is the reason we formulated the steps formally
for this test to make it easy. There is a tendency for students to start with the statistics, then read the table values and make a
decision about a hypothesis that they did not state. When this happens, note that it is meaningless. No marks should be awarded
for it. You have been advised.
68
Durbin-Watson test for autocorrelation
Many problems do not explicitly state that we have to test for positive or negative autocorrelation. In that case, the alternative
hypothesis changes and the decision rules for both the positive and negative autocorrelation must be examined. This version is a
two-sided test for autocorrelation.
Positive or negative autocorrelation test

vs
Ha ∶ The error terms are positively or negatively autocorrelated.
2. Calculate d.
3. Set α, the level of significance (usually 1% or 5%)
5.1 Reject H0 if d < dL, α or if (4 − d) < dL, α

2 2
5.2 Do not reject H0 if d > dU, α or if (4 − d) > dU, α

2 2
5.3 The test is inconclusive if dL, α ≤ d ≤ dU, α or if dL, α ≤ (4 − d) ≤ dU, α

2 2 2 2
We note that the steps are the same for all three different statistical hypothesis tests. There is one possibility when the test does
not give a clue. We say that the test is inconclusive, or the test fails.
ACTIVITY 3.12
Use the DW test to determine if the following residuals are positively or negatively AR(1). For argument’s sake, assume that
the model used from where these residuals were derived was to the fifth power. Use α = 0.10.
Error terms: − 2 7 − 4 3 − 9 14 − 4 1 − 5 3 − 1 2 − 5 3 − 9 5 − 2 7 − 1
69

vs
Ha ∶ The error terms are negatively or positively autocorrelated.
2
Σ (ei − ei−1 ) 2045
2. d= 2
= = 3.3802
Σei 605
α
3. α = 0.10 Ô⇒ 2
= 0.05.
4. k = 5, n = 19 and thus dL,0.05 = 0.75 and dU,0.05 = 2.02.
5. (a) We are tempted to not reject H0 since d > dU,0.05 .

That is the error terms are not positively autocorrelated. Must we stop here?
(b) Clearly, 4 − d = 0.61982 < dL,0.05 .
Hence, there is sufficient evident to reject H0 .
That is the error terms are negatively autocorrelated.
Conclusion: We reject H0 and conclude that the error terms are negatively autocorrelated.
3.4 Seasonal variation types

A times series is discussed in terms of itself, not in terms of the residuals. One way of identifying the types of seasonal variation is
to explore the graphical plot of a time series.
Seasonality was defined in the earlier units. Two types of seasonal variation are defined, namely constant seasonal variation
and increasing seasonal variation, but decreasing seasonal variation can be defined in a similar way.
Before we get to the real stuff, recall the patterns of fluctuations in waves. Waves have peaks and troughs, like the sine and
cosine curves. The magnitude of the fluctuation in these patterns is indicated by the minimum and maximum levels that peaks and
troughs can reach. A swing is a fluctuation shown by peaks and troughs.
Seasonal variation is a component of a time series which is defined as the repetitive and predictable movement around the
trend line in one year or less. It is detected by measuring time intervals in small units, such as days, weeks months or quarters.
Organizations facing seasonal variations, like the motor vehicle industry, are often interested in knowing their relative performance
to the normal seasonal variation. The same is true of the Department of Labour in South Africa, which expects unemployment to
increase in December (maybe even January to March) because recent graduates are just arriving into the market and also schools
have also been given a vacation for the summer. The main point is whether the increase is more or less than expected. Organizations
affected by seasonal variation need to identify and measure this seasonality to help with planning for temporary increases or
decreases in labor requirements, inventory, training, periodic maintenance, and so forth. Apart from these, the organizations need
to know if the seasonal variation they experience are more or less than the average rate.
Reasons for studying seasonal variation

There are three main reasons for studying seasonal variation.
70
1. The description of the seasonal effect provides a better understanding of the impact this component has upon a particular
series.
2. After establishing the seasonal pattern, methods can be implemented to eliminate it from the time-series to study the effect
of other components such as cyclical and irregular variations. This elimination of the seasonal effect is referred to as
deseasonalising or seasonal adjustment of data.
3. To project the past patterns into the future knowledge of the seasonal variations is a must.
4. Prediction of the future trend.
Assumptions
A decision maker or analyst must select one of the following assumptions when treating the seasonal component:
1. The impact of the seasonal component is constant from year to year.
2. The seasonal effect is changing slightly from year to year.
3. The impact of the seasonal influence is changing dramatically.
Seasonal Index
Seasonal variation is measured in terms of an index, called seasonal index. It is an average that indicates the percentage of an
actual observation relative to what it would be if no seasonal variation in a particular period is present. It is attached to each period
of the time series within a year. This implies that if monthly data are considered there are 12 separate seasonal indices, one for each
month, and 4 separate indices for quarterly data. The following methods are used to calculate seasonal indices to measure seasonal
variations of a time-series data.
1. Method of simple averages
2. Ratio to trend method
3. Ratio-to-moving average method
4. Link relatives method
In this module you will be required to develop forecasts by focusing on only two of these methods, namely; method of simple
averages and ratio-to-moving average method.
An example
Now let us try to understand the measurement of seasonal variation by using the Ratio-to-Moving Average method. This
technique provides an index to measure the degree of the seasonal variation in a time series. The index is based on a mean of
100, with the degree of seasonality measured by variations away from the base. For example if we observe the hotel rentals in a
winter resort, we find that the winter quarter index is 124. The value 124 indicates that 124 percent of the average quarterly rental
occurs in winter. If the hotel management records 1436 rentals for the whole of last year, then the average quarterly rental would
be 359(1436/4). As the winter-quarter index is 124, we estimate the number of winter rentals as follows:
359 × (124/100) = 445
In this example, 359 is the average quarterly rental, 124 is the winter-quarter index, and 445 the seasonalised spring-quarter
rental.
71
This method is also called the percentage moving average method. In this method, the original data values in the time-series
are expressed as percentages of moving averages. The steps and the tabulations are given below.
Steps
1. Find the centered 12 monthly (or 4 quarterly) moving averages of the original data values in the time-series.
2. Express each original data value of the time-series as a percentage of the corresponding centered moving average values
obtained in step (1). In other words, in a multiplicative time-series model, we get
(Observed values)/(Trend × Cyclical values) × 100 = (T × C × S × I)/(T × C) × 100 = (S × I) × 100
This implies that the ratio–to-moving average represents the seasonal and irregular components.
3. Arrange these percentages according to months or quarter of given years. Find the averages over all months or quarters of
the given years.
4. If the sum of these indices is not 1200 (or 400 for quarterly figures), multiply then by a correction factor = 1200/(sum of
monthly indices) or 400/(sum of quarterly indices). Otherwise, the 12 monthly averages or 4 quarterly averages will be
considered as seasonal indices.
Let us calculate the seasonal indices by the ratio-to-moving average method from the following data:
Table data
Year/Quarter I II III IV
2006 75 60 53 59
2007 86 65 53 59
2008 90 72 66 85
2009 100 78 72 93
Now calculations for 4 quarterly moving averages and ratio-to-moving averages are shown in the table below:
Let Q = Quarter, MA = Moving Average, CMA Centered Moving Average, then we complete the following table.
72
y
Year Q y 4 MA total 4 MA 4 CMA (T) ( ) × 100
T
2006 1 75
2 60
274 61.75 + 64.50 53
3 53 (75 + 60 + 53 + 59) = 247 = 61.75 = 63.125 × 100 = 83.96
4 2 63.125
258 64.50 + 65.75 59
4 59 (60 + 53 + 59 + 86) = 258 = 64.50 = 65.125 × 100 = 90.60
4 2 65.125
2007 1 86 263 65.75 65.75 130.80
2 65 263 65.75 65.74 98.86
3 53 263 65.75 66.25 80.00
4 59 267 66.75 67.625 87.25
2008 1 90 274 68.50 70.125 128.34
2 72 287 71.75 75.00 96.00
3 66 313 78.25 79.50 83.02
4 85 323 80.75 81.50 104.29
2009 1 100 329 82.25 83.00 120.48
2 78 335 83.75 84.75 92.04
3 72 343 85.75
4 93
Calculation of the Seasonal index:

Year Quarter
I II III IV
2006 − − 83.96 90.60
2007 130.8 98.86 80.00 87.25
2008 128.34 96.00 83.02 104.29
2009 120.48 92.04 − −
T otal 379.62 286.9 246.98 282.14
Mean (Seasonal index) 126.96 95.94 82.33 94.05
The total for the seasonal index is 126.54 + 95.63 + 82.33 + 94.05 = 398.55
Adjusted seasonal index:
Quarter Value
400
I × 126.96 = 127.00
398.55
400
II × 95.94 = 95.98
398.55
400
III × 82.33 = 82.63
398.55
400
IV × 94.05 = 94.39
398.55
The total of seasonal averages was found to be 398.55. Therefore the corresponding correction factor is 400/398.55 = 1.0018.
Each seasonal average is multiplied by the correction factor 1.0036 to get the adjusted seasonal indices as shown in the above table.
73
Remarks
In general, seasonal indices are calculated without multiplying by 100.
1. In an additive time-series model, the seasonal component is estimated as S = Y − (T + C + I) where S is for Seasonal values,
Y is for observed data values of the time-series, T is for trend values, C is for cyclical values and I is for irregular values.
2. In a multiplicative time-series model, the seasonal component is estimated as
Seasonal effect = (T × S × C × I)/(T × C × I) = Y /(T × C × I)
3. The deseasonalised time-series data will have only trend (T ), cyclical (C) and irregular (I) components and is expressed
as:
(i) Multiplicative model: Y /S = (T × S × C × I)/S = (T × C × I).

(ii) Additive model: Y − S = (T + S + C + I) − S = T + C + I
3.4.1 Constant and increasing seasonal variation

If the magnitude of the seasonal swing does not depend on the level of the time series, the time series is said to exhibit constant
seasonal variation. Figure 6.13 in the prescribed textbook displays such a time series. Increasing seasonal variation is displayed
in Figure 6.14 because the magnitude of the seasonal swing increases. Clearly, the magnitude of the seasonal fluctuation increases
with the increasing the level of the time series. The peaks and troughs are more distant apart on the right side of the display than in
the left. This type of time series is more difficult to handle. We usually attempt to make it easier by using transformation methods
to make it approximately constant. That is, we apply the transformation methods on increasing seasonal variations to make them
behave like constant seasonal variations.
ACTIVITY 3.13
Identify the type of seasonal variation for the time series:
(a) that takes the shape of an increasing linear trend.
(b) described by y = ax2 + bx + c with a > 0, the y intercept is y = −1, one root is x = 1.
(a) If there is any seasonality, it will be constant seasonal variation. The graph will closely look as Figure 6.1 (b) in the prescribed
textbook.
(b) Taking x ≥ 0 (generally true for time series) and a > 0, the graph is an increasing convex parabola with minimum at point
(0, −1) and passing through point (1, 0). Please draw it. If seasonality exists, this is an example of an increasing seasonal
variation.
3.5 Use of dummy variables and trigonometric functions

We involve dummy variables and trigonometric functions to approximate some seasonal time series. Earlier we said that a time
series with constant seasonal variation is easy to work with. In this section we introduce dummy variables and discuss time series
analysis where we model seasonal variation by using dummy variables and trigonometric functions.
Trigonometric functions include functions such as cosine, sine, tangent, secant, cosecant and cotangent. This exposition is
limited to sine and cosine functions. In addition, they are not discussed in great depth.
74
3.5.1 Time series with constant seasonal variation
Every time series has a trend of some kind, increasing, decreasing or none. If in addition there is seasonality, we determine if it is
constant or increasing seasonal variation. For presenting a time series with constant seasonal variation we use a model of the form:
yt = T Rt + SNt + εt
where
yt = the observed value of the time series at time t
T Rt = the trend at time t
SNt = the seasonal factor at time t
εt = the error term (irregular factor) at time t
ACTIVITY 3.14
What is the value of T Rt , the trend for a time series with no trend? Write down the above equation when there is no trend.
Assume that T Rt is a linear function of the form T Rt = β0 + β1 t + ϵ.
We recall that if a linear trend is increasing, then its slope β1 is positive while a decreasing trend has a negative slope. No trend
means that β1 = 0. Hence, the above model collapses to yt = β0 + SNt + εt .
DISCUSSION OF THE MODEL

The model implies that the time series with constant seasonal variation can be written as an average level µt together with
random fluctuations (as the error term εt ). These random fluctuations cause the observations to deviate from the average level. The
average level changes over time according to the equation:
µt = T Rt + SNt .
Now, the error term εt is a random variable. The assumption made about the error term is that it satisfies the usual regression
assumptions. Thus, we assume that the error terms have a constant variance, are identically and independently distributed (IID)
with a normal distribution. There is also a further implication that the magnitude of the seasonal swing is independent of the trend.
Let trt and snt be estimates of T Rt and SNt , respectively. Then the estimate of yt is:
ŷt = trt + snt .
Seasonality is a somewhat complex part in a time series. In the next section we use dummy variables to model seasonality.
75
3.5.2 Use of dummy variables
The seasonality of time series defines the seasons to be used. It is possible that a time series can be studied from observations that
are collected at different times of the day. An example is a pancake vendor who confirms that sales are very high in the morning,
low during the day and slightly higher in the afternoon. Here the seasons are the times of the day, and they are three in this case.
If we study a time series collecting data over a five-day week, the number of seasons is five. For some activities we may use a
seven-day week, allowing the number of seasons to be seven. If we use quarters of a year, there are four seasons. A common
tendency is to use months, in which case there will be twelve seasons. This simply means that the number of seasons will differ
from situation to situation, mainly depending on data collection pattern. In order to define dummy variables, we denote the number
of seasons by L.
Study the second rectangular box on p.299 of the textbook. We consider the seasonal factor SNt . We express this factor using
dummy variables as:
SNt = βs1 xs1,t + βs2 xs2,t + ... + βs(L−1) xs(L−1),t
where the constants βs1 , βs2 , ..., βs(L−1) are called the seasonal parameters and xs1,t , xs2,t , ...., xs(L−1)t are dummy variables
defined as:
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season 1
xs1,t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season 2
xs2,t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season (L − 1)
xs(L−1),t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
3.5.3 High season and low season

When seasonal data are studied, there is always a set of values that are high and others that are low. The seasonal parameter last
season has been arbitrarily assigned the value zero. Thus the last season is taken as the reference. The other parameters are defined
with respect to the one that we set at 0. That is, take the last parameter to be the one against which all others are gauged.
It is important that when we interpret seasonality, we exclude any contribution made by the trend. Thus, in simple terms, when
we obtain a negative βi , then we know that the time series at the ith season is lower than the level of the time series in the last
season. The season is categorised as a low season. Similarly, if we obtain a positive βj , then we know that the time series at the j th
season is higher than the level of the time series in the last season. Such a season is called a high season. To identify high seasons
and low seasons from graphs we look for troughs and peaks. Troughs indicate low seasons and peaks indicate high seasons.
76
NOTEWORTHY POINTS
● One of the season parameters has to be set at 0, and not necessarily the last one. However, it is often more convenient to set
the last one as we did. Note that some statistical packages take the first season as the reference, instead of the last season. If
we fail to set the dummy variables of one of the seasons to zero, least squares estimation may prove to be complex or require
an unusual approach.
● The dummy variable model is based on the time series that display constant seasonal variation. It is also common to refer to
constant seasonal variation as additive seasonal variation. We often apply transformation methods to a time series that shows
increasing seasonal variation to equalise the seasonal variation before using dummy variables.
DISCUSSION OF EXAMPLE 6.7 IN THE PRESCRIBED TEXTBOOK.
Let us briefly discuss Example 6.7 in the prescribed textbook about short-term forecast (up to one year) of the number of
occupied rooms in four hotels in a city. The collected data, reproduced below, were for 14 years.
t yt t yt t yt t yt t yt t yt t yt
1 501 25 555 49 585 73 645 97 665 121 723 145 748
2 488 26 523 50 553 74 593 98 626 122 655 146 731
3 504 27 532 51 576 75 617 99 649 123 658 147 748
4 578 28 623 52 665 76 686 100 740 124 761 148 827
5 545 29 598 53 656 77 679 101 729 125 768 149 788
6 632 30 683 54 720 78 773 102 824 126 885 150 937
7 728 31 774 55 826 79 906 103 937 127 1067 151 1076
8 725 32 780 56 836 80 934 104 994 128 1038 152 1125
9 585 33 609 57 652 81 713 105 781 129 812 153 840
10 542 34 604 58 661 82 710 106 759 130 790 154 864
11 480 35 531 59 584 83 600 107 643 131 692 155 717
12 530 36 592 60 644 84 676 108 728 132 782 156 813
13 518 37 578 61 623 85 645 109 691 133 758 157 811
14 489 38 543 62 553 86 602 110 649 134 709 158 732
15 528 39 565 63 599 87 601 111 656 135 715 159 745
16 599 40 648 64 657 88 709 112 735 136 788 160 844
17 572 41 615 65 680 89 706 113 748 137 794 161 833
18 659 42 697 66 759 90 817 114 837 138 893 162 935
19 739 43 785 67 878 91 930 115 995 139 1046 163 1110
20 758 44 830 68 881 92 983 116 1040 140 1075 164 1124
21 602 45 645 69 705 93 745 117 809 141 812 165 868
22 587 46 643 70 684 94 735 118 793 142 822 166 860
23 497 47 551 71 577 95 620 119 692 143 714 167 762
24 558 48 606 72 656 96 698 120 763 144 802 168 877
This is a case of seasonal data with 12 seasons. The values of t are related to the various months. For example, when September
is mentioned in the first year, we set t = 9. The dummy variable M1 will take the value 1 for September 0 for all the other months
of the first year.The same procedure is applied for all years with December being set to zero for all years since December is taken
as the reference level in the definitions of the 11 dummy variables. For simplicity, the dummy variables were used as follows:
M1 for xs1,t , M2 for xs2,t , . . . , M11 for xs11,t since the seasons s1 , s2 , . . . sL−1 are the months
M1 , M2 , . . . , M11 . The logarithm transformation yt∗ = ln yt was used to obtain a relatively constant variation. The plots of the
77
original data, as well as the square and quadric roots of the data are reported from page 297 to 298 in the prescribed textbook,
but please plot the graphs yourself. A graphical representation of yt∗ versus time is given below. Clearly, the seasonal variation is
constant since the size of the seasonal swing remains the same as the level of the time series increases.
The example wanted a forecast for January of the fifteenth year. The graph indicates that a linear trend is suitable for the data.
Therefore, the following model can be used for prediction:
yt∗ = β0 + β1 t + β2 M1 + β3 M2 + ... + β12 M11 + εt .
This is a multiple linear regression model with 12 predictor variables and hence 13 parameters (one per predictor variable and
intercept) have to be estimated. It is very difficult, even impossible, to fit this model by hand. The use of statistical package is
required. Excel, can also be used in some statistical analyses such as regression. The following is Excel output of the analysis.
SUMMARY OUTPUT
Multiple R 0.9943
R Square 0.9886
Observations 168
ANOVA
Regression 12 6.0674 0.5056 1124.79015 4.975E-144
Residual 155 0.0697 0.0004
Total 167 6.1371
78
Intercept 6.2875 0.0064 977.5404 0.0000 6.2748 6.3002
Time 0.0027 0.0000 80.5988 0.0000 0.0027 0.0028
M1 -0.0416 0.0080 -5.1862 0.0000 -0.0575 -0.0258
M2 -0.1121 0.0080 -13.9736 0.0000 -0.1279 -0.0962
M3 -0.0845 0.0080 -10.5317 0.0000 -0.1003 -0.0686
M4 0.0398 0.0080 4.9681 0.0000 0.0240 0.0557
M5 0.0204 0.0080 2.5441 0.0119 0.0046 0.0362
M6 0.1469 0.0080 18.3269 0.0000 0.1311 0.1627
M7 0.2890 0.0080 36.0588 0.0000 0.2732 0.3049
M8 0.3110 0.0080 38.8068 0.0000 0.2952 0.3269
M9 0.0560 0.0080 6.9861 0.0000 0.0402 0.0718
M10 0.0395 0.0080 4.9345 0.0000 0.0237 0.0554
M11 -0.1122 0.0080 -14.0030 0.0000 -0.1280 -0.0964
Hence, the fitted model is:
ŷt∗ = 6.2875 + 0.0027t − 0.0416M1 − 0.1121M2 − 0.0845M3 + 0.0398M4 + 0.0204M5

+0.1469M6 + 0.2890M7 + 0.3110M8 + 0.0560M9 + 0.0395M10 − 0.1122M11 .
A point estimate for the month of January is the following:
yt∗ = β0 + β1 t + β2 (1) + β3 (0) + ... + β12 (0) + εt
= β0 + β 1 t + β2 + ε t
With the estimates being completed, the model we will use to forecast the averages for January is:
ŷt∗ = β0 + β1 t + β2
= 6.2875 + 0.0027t − 0.0416.
Now for January of the 15th year we note that 14 years include January of year 1 up to December of year 14, which makes
t = 168 months (14 × 12 months). Thus, for January of year 15 we have t = 169. The forecast required is therefore:
∗
ŷ169 = 6.2875 + 0.0027 (169) − 0.0416
= 6.7022.
Since these were transformed data, the required value is ŷ169 = e6.7022 = 814.1951. It is as simple as this.
The computation of the 95% prediction interval of y169 requires more calculations, but not difficult. The average time is:
1 168 1 + 2 + ⋯ + 168
t̄ = ∑t= = 84.5.
168 t=1 168
79
The sum of squared differences of times minus the mean time is:
168
2 2 2 2
∑ (t − t̄) = (1 − 84.5) + (2 − 84.5) + ⋯ + (168 − 84.5) = 395122.
t=1
∗
Then, (169 − t̄)2 = (169 − 84.5)2 = 7140.25. Therefore, the 95% prediction interval of y169 is:
¿ √
2
∗ (168−13) Á 1 (169 − t̄) 1 7140.25
ŷ169 ± t0.025 sÁ À1 + + = 6.7022 ± 1.96(0.0212) 1 + + .
168 ∑168t=1 (t − t̄) 2 168 395122
∗
Hence, the 95% prediction interval of y169 is: [6.6602; 6.7442]. The 95% prediction interval of y169 , obtained by exponentiation
is the following:
[e6.6602 ; e6.7442 ] = [780.7071; 849.1196].
This prediction interval states that the owner of the hotel can be 95% confident that in period 169, that is January of year 15, the
average number of rooms occupied will not be less than 780 and not greater than 850 per day.
ACTIVITY 3.15
Consider the model:
yt = 5 − 2M1 + 4M2 + 3M3 − 14M4 + M5 + εt
(a) Identify the:
(i) Number of seasons. Provide any necessary explanation.

(ii) Trend term. Provide its nature.
(iii) Low seasons (and justify your choice).
(iv) High seasons (and justify your choice).
(b) Present models for each season.
(c) For simplicity and practicality, let us assume that the above model is based on a six-day week. Prepare forecasts for:
(i) The Saturday of the first week

(ii) The Thursday of the fifth week
(iii) The Friday of the third week
(iv) The Monday of the tenth week
(a) (i) We see five seasons being displayed in the model. This means that the sixth seasonal parameter has been set to 0. Thus,
six seasons are involved.
(ii) The trend term comes only from 5, which is a constant term. Therefore there is no trend.
80
(iii) The first and the fourth seasons are low seasons because they have negative seasonal parameters.
(iv) The second, third and fifth season are high because they have positive seasonal parameters.
(b) The models for the various seasons are:

First season y1 = 5 − 2(1) + 4 (0) + 3 (0) − 14 (0) + (0) + εt
= 3 + εt
Second season y2 = 5 − 2 (0) + 4(1) + 3 (0) − 14 (0) + (0) + εt
= 9 + εt
Third season y3 = 5 − 2 (0) + 4 (0) + 3(1) − 14 (0) + (0) + εt
= 8 + εt
Fourth season y4 = 5 − 2 (0) + 4 (0) + 3 (0) − 14(1) + (0) + εt
= −9 + εt
Fifth season y5 = 5 − 2 (0) + 4 (0) + 3 (0) − 14 (0) + (1) + εt
= 6 + εt
Sixth season y6 = 5 − (0) + 4 (0) + 3 (0) − 14 (0) + (0) + εt
= 5 + εt
(c) (i) The Saturday of the first week.

Saturday coincides with the sixth season. In the first week this is t = 6.
ŷ6 = 5
(ii) The Thursday of the fifth week.

Thursday is what was the fourth season in the statements given. For Thursday of the fourth week, t = [(4 × 6) + 4] = 28.
This means M4 = 1. Hence,
ŷ28 = 5 − 14 = −9
(iii) The Friday of the third week

Friday is the fifth season, and in the third week, the Friday is t = [(2 × 6) + 5] . This means M5 = 1.
ŷ17 = 5 + 1 = 6
(iii) The Monday of the tenth week

Monday is the first season. The Monday of the tenth week is t = [(6 × 9) + 1] = 55. The forecast for that Monday is:
y55 = 5 − 2 = 3
Note that the trend was given by T Rt = 5. This is a constant, which effectively implies that there is no trend. We next look at
the use of trigonometric functions.
81
3.5.4 Use of trigonometry in a model with a linear trend
It is common that trigonometric terms are incorporated in a time series regression model that shows either constant or increasing
seasonal variation. The general form of such incorporation is:
yt = T Rt + f (t) + εt
where
f (t) = an expression of trigonometric functions at time t.
The two most used trigonometric models for constant variation and linear trend are the following: Let us assume a linear trend
and suppose that:
2πt 2πt
yt = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + εt .
L L
and
2πt 2πt 4πt 4πt
yt = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + β4 sin ( ) + β5 cos ( ) + εt
L L L L
where L is the number of seasons.

You may experiment with various values of t, but attend to the next activity.
ACTIVITY 3.16
Simplify the two models when:
(i) t = L
L
(ii) t =
2
(i) If t = L, then the first model simplifies to:
2πL 2πL
yL = β0 + β1 L + β2 sin ( ) + β3 cos ( ) + εL .
L L
= β0 + β1 L + β2 sin(2π) + β3 cos(2π) + εL
= β0 + β1 L + β3 + εL .
The second model simplifies to:

2πL 2πL 4πL 4πL
yL = β0 + β1 L + β2 sin ( ) + β3 cos ( ) + β4 sin ( ) + β5 cos ( ) + εL
L L L L
= β0 + β1 L + β2 sin(2π) + β3 cos(2π) + β4 sin(4π) + β5 cos(4π) + εL
= β0 + β1 L + β3 + β 5 + ε L .
82
L
(ii) If t = 2
, then the first model simplifies to:
2π L2 2π L
yL = β0 + β1 L2 + β2 sin ( ) + β3 cos ( 2 ) + ε L .
2 L L 2
= β0 + β1 L2 + β2 sin(π) + β3 cos(π) + ε L
2
= β0 + β1 L2 − β3 + ε L .
2
The second model simplifies to:

2π L2 2π L 4π L 4π L
yL = β0 + β1 L2 + β2 sin ( ) + β3 cos ( 2 ) + β4 sin ( 2 ) + β5 cos ( 2 ) + ε L
2 L L L L 2
= β0 + β1 L2 + β2 sin(π) + β3 cos(π) + β4 sin(2π) + β5 cos(2π) + ε L

2
= β0 + β1 L2 − β3 + β5 + ε L .
2
ACTIVITY 3.17
L
Simplify the two models when t = .
4
Appropriate substitutions give (try yourself):
(1) First model: y L = β0 + β1 L + β2 + ε L .

4 4
(2) Second model: y L = β0 + β1 L + β2 − β5 + ε L .

4 4
Let us look again the hotel data presented above and analysed using dummy variables. Since the transformed data had a linear
trend and a constant seasonal variation and that there are 12 seasons, we can also analyse the data using the following trigonometric
regression model:
2πt 2πt 4πt 4πt
yt∗ = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + β4 sin ( ) + β5 cos ( ) + ϵt
12 12 12 12
where yt∗ = ln yt .
Fitting the model in Excel first requires the trigonometric transformations of the time points. For instance, if t = 1 is in cell
AJ2 and that we need put sin (f rac2πt12) in cell AK2, we must type in cell AK2, the expression = SIN(2 ∗ PI() ∗ AJ2/12),
then after picking the answer, we scroll down in column AK for the remaining answers. The same procedure must be repeated for
all the other values. Excel output of the trigonometric regression model is the following:
SUMMARY OUTPUT
Multiple R 0.9577
R Square 0.9172
Observations 168
83
ANOVA
Regression 5 5.6289 1.1258 358.8645 1.14159E-85
Residual 162 0.5082 0.0031
Total 167 6.1371
Intercept 6.3327 0.0087 728.3737 0.0000 6.3156 6.3499
t 0.0027 0.0001 30.6389 0.0000 0.0026 0.0029
X1 -0.1009 0.0061 -16.4862 0.0000 -0.1130 -0.0888
X2 -0.1266 0.0061 -20.7208 0.0000 -0.1387 -0.1146
X3 0.0662 0.0061 10.8359 0.0000 0.0542 0.0783
X4 0.0190 0.0061 3.1076 0.0022 0.0069 0.0311
where X1 = sin ( 2πt

12
), X2 = cos ( 2πt
12
), X3 = sin ( 4πt
12
), and X4 = cos ( 4πt
12
).
Hence, the fitted model is:
2πt 2πt 4πt 4πt

ŷt∗ = 6.3327 + 0.0027t − 0.1009 sin ( ) − 0.1266 cos ( ) + 0.0662 sin ( ) + 0.0190 cos ( ).
12 12 12 12
∗
It follows from this fitted model, that the point forecast y169 is:
∗
ŷ169 = 6.3327 + 0.0027(169) − 0.1009 sin ( 2π(169)
12
) − 0.1266 cos ( 2π(169)
12
)
+0.0662 sin ( 4π(169)
12
) + 0.0190 cos ( 4π(169)
12
).
= 6.6957.
The point forecast for January of the 15th , obtained by exponentiation of the above result is:
ŷ169 = e6.6957 = 808.920. Noting, that the model has six parameters and that the standard error is s = 0.0560, the 95% prediction
∗
interval of y169 is:
¿ √
∗ (168−6) Á 1 (169 − t̄)2 1 7140.25
ŷ169 ± t0.025 sÁÀ1 + + = 6.6957 ± 1.96(0.0560) 1 + + = [6.5846; 6.8068].
168 ∑168
t=1 (t − t̄)2 168 395122
Hence, the 95% prediction interval for y169 , obtained by exponentiating the above results is:
[e6.5846 ; e6.8068 ] = [723.8614; 903.9735].
Note that this prediction interval is wider, that is less precise, than the prediction interval
[780.7071; 849.1196] obtained using dummy variables. The widths of the two prediction intervals are 903.9735 − 723.8614 =
180.1122 and 849.1196 − 780.7071 = 68.4125, respectively.
3.6 Growth curve models

The models we discussed so far described trend and seasonal effects using deterministic functions of time that are linear in the
parameters. In this section we discuss the growth curve model, which is one of the models that are not linear in the parameters.
Using the usual notation, the growth curve model is:
yt = β0 β1t εt
84
This is a complicated model that we are not going to strive to unravel. However, since the decomposition of time series may
be multiplicative, we may end up with a form of a time series that resembles growth curves. The question will be “How do we
handle it?” since nonlinear forms are easier to handle, we will transform these to linear forms. If we assume that the parameters are
positive, we can use the natural log-transformation on both sides of the equation of the model to obtain the following linear form:
ln yt = ln β0 + (ln β1 ) t + ln εt
which can be written as:
yt∗ = β0∗ + β1∗ t + ε∗t
This is a familiar form once you understand how the transformation is done. We are allowed to work on the transformed data
and reverse the answers using inverse of the transformation used.
The example is about steakhouses opened over the last 15 years as reported in the following table:
Year(t) yt ln yt (t − t̄)2
1 11 2.3979 49
2 14 2.6391 36
3 16 2.7726 25
4 22 3.0910 16
5 28 3.3322 9
6 36 3.5835 4
7 46 3.8286 1
8 67 4.2047 0
9 82 4.4067 1
10 99 4.5951 4
11 119 4.7791 9
12 156 5.0499 16
13 257 5.5491 25
14 284 5.6490 36
15 403 5.9989 49
Total 280
An analyst wanted to predicted the number of steakhouses that will be operating next year.
85
The plot of the number of steakhouses (yt ) versus Year (t) is the following:
Clearly, this graph indicates that the data do not exhibit a linear trend. The plot of ln yt versus t is the following:
This graph indicates that a model with linear trend is suitable for the log-transformed data. Excel output for fitting the model
yt∗ = β0∗ + β1∗ t + ε∗t
is given below.
86
SUMMARY OUTPUT
Multiple R 0.9980
R Square 0.9960
Observations 15
ANOVA
Regression 1 18.4765 18.4765 3239.9689 0.0000
Residual 13 0.0741 0.0057
Total 14 18.5506

Intercept 2.0701 0.0410 50.4510 0.0000 1.9815 2.1588
Time 0.2569 0.0045 56.9207 0.0000 0.2471 0.2666
Hence, the fitted model is: ŷt∗ = 2.0701 + 0.2569t. To obtain the corresponding nonlinear model, note that β̂0 = exp(2.0701) =
7.9256 and β̂1 = exp(0.2569) = 1.2929. Therefore, the fitted nonlinear model is:
ŷt = 7.9256 (1.2929t ) .
∗ ∗
The point prediction of y16 is: ŷ16 = 2.0701 + 0.2569(16) = 6.1805 which implies that the point prediction of y16 is: ŷ16 =
e6.1805 = 483.2335. Therefore, the number of operating steakhouses in year 16 is approximately 483.
∗
Now, let us calculate the 95% prediction intervals for y16 and y16 .
1 15
The average time is t̄ = ∑t=1 t = 1+2+⋯+15
15 15
= 8. The squared differences (t − t̄)2 and their sum are has been included in the table
containing the original and logged data. In addition (16−t̄)2 = 82 = 64, the standard error is s = 0.0755 and t α (n−2) = t13
0.025 = 2.160.
2
∗
Hence, the 95% prediction interval for y16 is:
¿ √
∗ 13
Á
Á
À 1 (16 − t̄)2 1 64
ŷ16 ± t0.025 s 1 + + 15 = 6.1805 ± 2.160(0.0755) 1 + + = [5.9949; 6.3661]
15 ∑t=1 (t − t̄) 2 15 280
∗
The 95% prediction interval of y16 , obtained by exponentiating the prediction limits of y16 is the following:
[exp(5.9949); exp(6.3661)] = [401.3765; 581.7844].
Thus, the analyst is 95% confident that, in average, the number of operating steakhouses in year 16 will not be less than 401 and
will not be more than 582.
3.7 AR(1) and AR(p)

The first contact with the first-order autoregressive models, or AR(1), was with the Durbin-Watson (DW) statistic. When the error
terms for a regression model are autocorrelated, the model is not adequate and the autocorrelation should be modelled. Modelling
autocorrelation beyong the scope of this module. Thus detailed description is deferred to next similar modules.
87
The AR(1) is a special case of the general autoregressive process of order p given by:
εt = ϕ1 εt−1 + ϕ2 εt−2 + ... + ϕp εt−p + at
ACTIVITY 3.18
Write a model for AR(2), a second-order autoregressive model.
The activity requires AR(p) when p = 2, which is:
εt = ϕ1 εt−1 + ϕ2 εt−2 + at .
3.8 Use of trend and seasonality in forecast development

Make sure that your mathematical model for forecasting reflects the trend and the seasonal pattern if they are supposed to be there.
Also, you will have to modify the model appropriately when you have decided on the exact period when the forecast is needed.
For example, if you know that it is the fourth quarter of the ninth month, write the model to suit that period before making any
substitutions.
To make sure that you have the right period, your polynomial as a function of t should suit the given time period. As an example,
starting from January of the current year, to evaluate ŷ20 when time is given as monthly would imply August of the following year.
This means that the equation used should be suitable for August months. On the other hand, if quarters are used, starting from the
current year, ŷ20 would mean the fourth quarter of the fifth year. If you are dealing with months, and your equation is given as a
function t, any future time given should be converted into months. For example, if you are required to predict a value for February
of the fourth year from the current year you should be able to set t = 38. If you are to find a prediction for the third quarter of the
seventh year, you should be able to set that t = 31 for that prediction on the quarterly model.
3.9 Conclusion
This unit introduced important aspects of time series. It used graphical plots to demonstrate some patterns, then incorporated some
applications of estimation. Trend and seasonality were discussed. The AR(1) process was also introduced, and the DW statistic
was used to detect positive and negative first order autocorrelations. Two types of seasonal variation, constant and increasing
seasonal variations were discussed. Modelling using dummy variables and trigonometric ratios were discussed in great details.
Growth models were also discussed. Real life examples were used to illustrate the theoretical components.
We are ready for the next important unit.
88
Unit 4
Decomposition of a time series
Outcome table for the study unit.

This study unit focuses on the multiplicative and the additive decomposition of time series. The outcomes of the unit are:
• Estimate each component of a multiplicative decomposition of a time series.
• Estimate each component of an additive decomposition of a time series.
• Conduct a seasonal adjustment of a time series model.
The following table gives further details on the unit’s outcomes.
Outcomes - At the end

of the unit you Assessment Content Activities Feedback
should be able to
- identify time series - discuss the
- describe, or - decompose - work out
components activity
unpack a a time series exercises
series
- compute the trend - determine - discuss the

- MA, - describe the
moving activities
trend analysis data trend
averages
- deseasonalise a - MA, - discuss the

- determine - isolate trend
time series seasonal indices activities
seasonal and
indices seasonality
- centered MA - discuss the

- forecast future - develop - incorporate
activities
values of a forecasts the trend in
time series the forecast
89
4.1 Introduction
This is a continuation of concepts introduced in earlier chapters. Components of a time series should now be at your fingertips.
Are they? This unit deals with the decomposition of a time series, which aims to isolate the influence of each of the components
on the actual time series. It is presented as Chapter 7 in the prescribed textbook.
Decomposition of time series is an important technique for all types of time, especially for seasonal adjustment. It seeks to
construct, from an observed time series, a number of component series (that could be used to reconstruct the original time series
by additions or multiplications) where each of these has a certain characteristic or type of behaviour.
ACTIVITY 4.1
State the components of a time series.
The answer you can find in Unit 1 if you already forgot.
The components into which time series can be decomposed into are:
● the Trend Component Tt that reflects the long term progression of the series
● the Cyclical Component Ct that describes repeated but non-periodic fluctuations, for instance caused by the economic cycle
● the Seasonal Component St reflecting seasonality (Seasonal variation)
● the Irregular Component It (or “noise”) that describes random, irregular influences. Compared to the other components it
represents the residuals of the time series.
4.2 Multiplicative decomposition

Multiplicative decomposition is applicable when a time series displays increase in the amplitude of both the seasonal and irregular
variations as the level of the trend rises.
Multiplicative decomposition of a time series model grants that the actual values of a time series yt be presented as a product
of the trend component T Rt , a seasonal index SNt , a cyclical index CLt and an irregular measure IRt . The trend component
measured in actual units, the cyclical index is then expressed relative to the trend, and the seasonal index is expressed relative to
the trend and the cyclical index. Thus, the multiplicative decomposition model is:
yt = T Rt × SNt × CLt × IRt .
When a time series exhibits increasing seasonal variation, it is represented in this form. Statistical analysis is useful for effective
isolation and analysis of the trend and the seasonal components. Hence, we will examine statistical approaches to quantify trend
and seasonal variations. These are the components that usually account for a significant proportion of the actual values in a time
series. Isolating them is an opportunity to explain the actual time series values.
90
4.2.1 Trend analysis
This discussion uses moving averages to analyse trend. When averaging out the short-term fluctuations in a time series, the trend
is identified. Either a smooth curve or a straight line would emerge. Earlier we discussed time series regression. It is one method
used to isolate trend. The other is by use of moving averages (MAs). In this section we will discuss the MA.
The term ‘trend’ may be seen as a tendency or resulting behaviour of occurrence of something observed over a long term. In a
nutshell, “trend analysis” is a term referring to the concept of collecting information and attempting to spot a pattern, or trend, in
the information. In some fields of study, the term “trend analysis” has more formally-defined meanings.
For example, in project management, trend analysis is a mathematical technique that uses historical results to predict future
outcome. This is achieved by tracking variances in cost and schedule performance. In this context, it is a project management
quality control tool.
Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such
as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings
reigned.
Moving Average (MA)

A MA is a successive averaging of groups of observations. The number n of observations averaged in each group must be
the same throughout. It is determined by the number of periods that span the short-term fluctuations. MA removes the short-term
fluctuations in a time series, and it smooths it.
We explain the MA for a 3-period MA. The following steps are involved:
● Add observations for the first three periods and find their average. Place the answer opposite the middle time period, i.e.
opposite the second measurement.
● Remove the observation for the earliest period and replace it by the fourth measurement. Obtain the new average and place
it opposite the third measurement.
● Repeat the process until you do not have enough observations to produce a MA of three periods.
Note that the above illustration used a case where you will be able to place the MA next to a middle observation in the first
average. The same will be easy when a 5-period MA is needed, or a 7-period one. That is, for odd number MA we will not struggle
to place the MA in the middle. There will be practical cases where we need to use 2-period MA, 4-period MA, and so on. The
prescribed textbook provides several examples, but we will discuss see some examples in the activities of this unit.
ACTIVITY 4.2
Consider the following data:
170, 140, 230, 176, 152, 233, 182, 161, 242
They were collected for three days over the regular time periods 8–12 noon, 12–4 p.m. and 4–8 p.m. Calculate appropriate
moving averages and explain the trend of the data.
91
We can call these times morning, afternoon and evening for convenience. It seems obvious to use a 3-period MAs. According
to the guideline given, we start with averages per day:
Average
Day 1 Morning 170
Afternoon 140 540/3 = 180
Evening 230
Day 2 Morning 176

Afternoon 152 561/3 = 187
Evening 233
Day 3 Morning 182

Afternoon 161 585/3 = 195
Evening 242
The average for each day has been placed opposite the midpoint of that day, i.e., the afternoon period. We need a trend figure
for every period, not just for the afternoons. It is not yet a clearly moving average. We make them “move” by removing oldest and
replacing with the newest observations. The table becomes:
Moving average = trend

Day 1 Morning 170
170 + 140 + 230
Afternoon 140 = 180
3
140 + 230 + 176
Evening 230 = 182
3
Day 2 Morning 176 186

Afternoon 152 187
Evening 233 189
Day 3 Morning 182 192

Afternoon 161 195
Evening 242
Now, we answer the question about the trend. We note that the MAs are clearly increasing. This simply informs us that on
average, the above observations are increasing. Hence, we have an increasing trend.
4.2.2 Seasonal analysis

Seasonal analysis isolates the influence of forces that are due to seasonality in a time series. Useful methods known in time series
analysis for dealing with seasonality include the ratio-to-moving-average method. It measures seasonal influences using index
numbers, which are percentage deviations from actual values of the series from base values. In this module we use seasonal
variation analysis to accomplish the task.
92
The steps for quantifying the seasonal variation are:
● Calculate seasonal deviations by dividing the observations by the corresponding MAs
● Group the ratios according to season
● Average the ratios in the groups
ACTIVITY 4.3
Use the data in the previous activity to estimate seasonal indices.
We use the table above to calculate according to first instruction directive.
Step 1 Actual divided by MA

Day 1 Morning −
140
Afternoon 180
= 0.7778
230
Evening 182
= 1.2637
176
Day 2 Morning 186
= 0.9462
152
Afternoon 187
= 0.8128
233
Evening 189
= 1.2328
182
Day 3 Morning 192
= 0.9479
161
Afternoon 195
= 0.8256
Evening
Due to random influences, values for the same periods differ. But it is clear that there is a common pattern. For example,
afternoon values of Actual/MA are similar in size (0.78–0.81–0.83) The same is true about the evening figures of 1.26 and 1.23;
and with “luck” the morning figures are both 0.95. The seasonal indices are found by computing the averages of Actual/MA per
season as in the following table.
Step 2 Morning Afternoon Evening

Day 1 − 0.7778 1.2637
Day 2 0.9462 0.8128 1.2328
Day 3 0.9479 0.8256 −
Total 1.8941 2.4162 2.4965
Average snt 0.9471 0.8054 1.2483
The snt numbers 0.9471, 0.8054 and 1.2483 are unadjusted seasonal indices since their 3.0008 is not the number of seasons
L = 3. Adjusted seasonal indices are calculated by multiplying each one by the correction factor
L
L
∑t=1 snt
93
where in this case L = 3. Therefore, the seasonal indices in this example are:
3 3
sn1 = sn1 × 3
= 0.9471 × = 0.9468.
∑t=1 snt 3.0008
3 3
sn2 = sn2 × 3
= 0.8054 × = 0.8052.
∑t=1 snt 3.0008
3 3
sn3 = sn3 × 3
= 1.2483 × = 1.2480.
∑t=1 snt 3.0008
Note that sn1 + sn2 + sn3 = L = 3 as required for seasonal indices.
We have now isolated the trend and seasonal effects present in the time series. Knowledge of seasonal effects is important for
forecasting as well as for removing strong seasonal effects that may conceal other important features or movements in a data set.
Knowledge that there are random variations is of no use in forecasting, but being essentially unpredictable, they serve as a
guide to the reliability of a forecast. When there are very small random influences a process is likely to produce reliable forecasts
while large fluctuations may completely upset even the most carefully calculated forecasts.
4.2.3 Determination of the trend

Our next step in analysing the time series is to calculate the trend after removing the seasonality effect. To do so, we must first
compute the deseasonalised observations in each period t by dividing the observed values yt by the seasonal indices snt , that is:
yt
dt = .
snt
For our example, see the deseasonalised observations in column 5 of the following table (later referred to as the main table).
Time (t) yt MA snt dt trt trt × snt clt × irt clt irt
1 170 - 0.9468 179.5522 177.0841 167.6632 1.0139 - -
2 140 180 0.8052 173.8698 179.6232 144.6326 0.9680 0.9979 0.9700
3 230 182 1.248 184.2949 182.1623 227.3386 1.0117 0.9954 1.0164
4 176 186 0.9468 185.8893 184.7014 174.8753 1.0064 1.0088 0.9977
5 152 187 0.8052 188.773 187.2405 150.7661 1.0082 0.9995 1.0087
6 233 189 1.248 186.6987 189.7796 236.8449 0.9838 0.9972 0.9866
7 182 192 0.9468 192.2264 192.3187 182.0873 0.9995 1.0031 0.9964
8 161 195 0.8052 199.9503 194.8578 156.8995 1.0261 1.0027 1.0234
9 242 - 1.248 193.9103 197.3969 246.3513 0.9823 - -
Now, assume that the trend is linear and thus can be modelled as
T Rt = β0 + β1 t + ϵt
where ϵt are error terms with mean zero. The estimation of the parameters β0 and β1 can be done using least squares.
The following table contains information to be used in the estimation.
94
Time (t) dt tdt t2
1 179.5522 179.5522 1
2 173.8698 347.7397 4
3 184.2949 552.8846 9
4 185.8893 743.5572 16
5 188.773 943.8649 25
6 186.6987 1120.192 36
7 192.2264 1345.585 49
8 199.9503 1599.603 64
9 193.9103 1745.192 81
Total 45 1685.165 8578.171 285
Hence,
9 ∑9t=1 tdt − ∑9t=1 t ∑9t=1 dt 9 × 8578.171 − 45 × 1685.165
β̂1 = 2
= = 2.5391
9 ∑9t=1 t2 − (∑9t=1 t) 9 × 285 − (45)2
and
9 9
∑ yt − β̂1 ∑t=1 t 1685.165 − 2.5391 × 45
β̂0 = t=1 = = 174.5450.
9 9
Hence, the fitted equation of the trend is: trt = 174.5450 + 2.5391t. The trend values in our example are reported in the sixth
column of the main table. For instance, the first value is tr1 = 174.5450 + 2.5391(1) = 177.0841.
4.2.4 Estimation of the cyclical and irregular values

It follows from the equation yt = trt × snt × clt × irt that:
yt
clt × irt = .
trt × snt
In the present example, the above values are reported in the 8th column of the main table. The above equation does not allow the
estimation of CLt but, as stated in the prescribed textbook, CLt can be estimated by the following three-period moving average:
clt−1 irt−1 + clt irt + clt+1 irt+1

clt = .
3
The estimates of the cyclical components are reported in the 9th column of the main table. Once the cyclical components are
estimated, the irregular components are estimated as follows:
clt × irt
irt = .
clt
For our example, the last column of the main table gives the estimates of the irregular components.
4.2.5 Obtaining a forecast

The central purpose of time series is to develop forecasts. If a multiplicative decomposition model is used and the four components
are estimated, the point forecasts are given by
ŷt = trt × snt × clt × irt .
If there is no pattern in the irregular component, the can predict that all values irt are equal to one. In that case, we have:
ŷt = trt × snt × clt
95
if a well-defined cycle exists in the time series. If there is no well-defined cycle, that is when the clt values are close to one, the
point forecasts are given by:
ŷt = trt × snt .
ACTIVITY 4.4
Consider the example given in Activity 4.2.
(a) Does the time series exhibit a well-defined cycle? Explain your answer.
(b) Calculate the point forecasts for all the 9 observations.
(a) The time series does not exhibit a well-defined cycle since all the cyclical values are close to one.
(b) Since all cyclical values and irregular values are close to one, point forecasts are determined by the equation
ŷt = trt × snt .
The point forecast for the 9 observations are displayed in the 9th column of the main table.
ACTIVITY 4.5
The following data represent sales of pies in thousands in the various quarters of the years 2001 to 2003. Using a multiplicative
decomposition of a time series, determine the estimates of the four components.
Quarter 1 2 3 4 1 2 3 4 1 2 3 4
Value 142 54 162 206 130 50 174 198 126 42 162 186
All the procedures for analysing time series with seasonal patterns were explained. The appropriate number of periods for the
moving average is clearly four, so the first moving average is:
142 + 54 + 162 + 206
= 141
4
This figure obviously belongs to the middle of the first year, which comes halfway in between the second and third quarters. The
table follows at the bottom of this explanation. This will be true with other moving averages too. Therefore we need an additional
step called centering the moving averages. The results of this centering give rise to the centred moving averages (CMAs).
The moving average 141 applies to a point halfway between the second and the third quarters of year 2001, while the figure
138 applies midway between the third and fourth quarters, we can obtain the moving averages directly comparable with the fourth
quarter by taking the average of 141 and 138, which is 139.5. Doing the same for all moving averages, we obtain the values in the
fifth and sixth columns of the following table.
yt
To calculated the seasonal indices, we first calculate the ratios CM A
as displayed in the seventh column of the table. The
seasonal indices are calculated in the similar manner as the one we had in Activity 4.3. The following table gives all values.
96
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 Total
1 1.1613 1.4982
2 0.9386 0.3597 1.2655 1.4559
3 0.9438 0.3218
Total 1.8824 0.6815 2.4268 2.9541
Mean: snt 0.9412 0.34075 1.2134 1.47705 3.9724
snt 0.9477 0.3431 1.2218 1.4873 4
yt
These indices were copied in the 8th column of the main table, then the deseasonalised values snt
were inserted in the 9th
column of the table. The deseasonalised values were used as response values in the linear model
d t = β0 + β1 t + ϵt .
The following table provides useful information for parameter estimation and its last two columns will be used later for interval
prediction.
Time (t) dt tdt t2 (t − t̄)2 trt (dt − trt )2

1 149.8 149.8 1 30.25 149.2167 0.384085
2 157.4 314.8 4 20.25 147.0831 106.2016
3 132.6 397.8 9 12.25 144.9495 152.7261
4 138.5 554 16 6.25 142.8159 18.57509
5 137.2 685.9 25 2.25 140.6823 12.30669
6 145.7 874.4 36 0.25 138.5487 51.57262
7 142.4 996.9 49 0.25 136.4151 35.97281
8 133.1 1065 64 2.25 134.2815 1.33254
9 133 1197 81 6.25 132.1479 0.648937
10 122.4 1224 100 12.25 130.0143 57.77534
11 132.6 1459 121 20.25 127.8807 22.18936
12 125.1 1501 144 30.25 125.7471 0.473714
Total 78 1650 10418 650 143 460.1589
Hence,
12 ∑12 12 12
t=1 tdt − ∑t=1 t ∑t=1 dt 12 × 10418.49 − 78 × 1649.783
β̂1 = 2
= = −2.1336
12 2 12
12 ∑t=1 t − (∑t=1 t) 12 × 650 − (78)2
and
12 12
∑ yt − β̂1 ∑t=1 t 1649.783 + 2.1336 × 78
β̂0 = t=1 = = 151.3503.
12 12
Hence, the fitted equation of the trend is: trt = 151.3503 − 2.1336t. The trend values in our example are reported in the tenth
column of the main table.
For instance, the first value is tr1 = 151.3503 − 2.1336(1) = 149.2167.
97
yt
Yr Q t yt MA CM A CM A
snt dt trt trt × snt clt × irt clt irt
1 1 1 142 - - - 0.95 149.84 149.22 141.41 1.00 - -
1 2 2 54 - - - 0.34 157.39 147.08 50.46 1.07 1.00 1.07
1 3 3 162 141 139.5 1.16 1.22 132.59 144.95 177.10 0.91 0.98 0.93
1 4 4 206 138 137.5 1.50 1.49 138.51 142.82 212.41 0.97 0.95 1.02
2 1 5 130 137 138.5 0.94 0.95 137.17 140.68 133.32 0.98 1.00 0.98
2 2 6 50 140 139 0.36 0.34 145.73 138.55 47.54 1.05 1.02 1.03
2 3 7 174 138 137.5 1.27 1.22 142.41 136.42 166.67 1.04 1.03 1.01
2 4 8 198 137 136 1.46 1.49 133.13 134.28 199.72 0.99 1.01 0.98
3 1 9 126 135 133.5 0.94 0.95 132.95 132.15 125.24 1.01 0.98 1.03
3 2 10 42 132 130.5 0.32 0.34 122.41 130.01 44.61 0.94 0.99 0.95
3 3 11 162 129 - - 1.22 132.59 127.88 156.24 1.04 0.99 1.05
3 4 12 186 - - - 1.49 125.06 125.75 187.02 0.99 - -
The point forecast ŷt = trt × snt are reported in the 11th column of the table.
yt
The ratios clt × irt = trt ×snt
are reported in the 12th column of the table.
clt−1 irt−1 +clt irt +clt+1 irt+1

The cyclical estimates clt = 3
are reported in the 13th column of the table, and finally the irregular
clt ×irt
estimates irt = clt
are reported in the last column of the table.
Clearly, the time series in this example does not have a well-defined cycle since all the cyclical values are close to one.
4.2.6 Prediction interval

The last point in this section is the computation of predicted interval of a new observation. The prediction interval of a future
observation yt is given by
yt ± Bt [100(1 − α)]]
[̂
where α is the level of significance, and Bt [100(1 − α)] is the error bound in a 100(1 − α)% prediction interval [trt ± Bt (1 − α)]
for the deseasonalised observation dt = T Rt + ϵt = β0 + β1 t + ϵt .
ACTIVITY 4.6
Consider the data given in Activity 4.5. Determine the 95% prediction interval for the pie sales in the first quarter of year 2004.
The estimate of the trend was found to trt = 151.3503 − 2.1336t. The first quarter of year 2004 corresponds to t = 13 and
snt = 0.95. The point prediction of the trend at time t = 13 is tr13 = 151.3503 − 2.1336(13) = 123.6135. Hence, the point forecast
(prediction) is
ŷ13 = tr13 × sn13 = [151.3503 − 2.1336(13)] × 0.95 = 117.4328.
The standard error for the regression model dt = T Rt + ϵt = β0 + β1 t + ϵt is
√ √ √
n 12
∑t=1 (dt − trt )2 ∑t=1 (dt − trt )2 460.1589
s= = = = 6.7835.
n−2 12 − 2 10
98
Also,
(n−2) (10)
tα/2 = t0.025 = 2.228
and the average time is
1 12
t̄ = ∑ t = 6.5
12 t=1
which implies that (13 − t̄)2 = 42.25. Finally, note that from the table used for parameter estimation, we have:
12
2
∑ (t − t̄) = 143.
t=1
Hence, the 95% prediction interval of T R13 is:

¿ √
(10) Á Á
À 1 (13 − t̄)2 1 42.25
tr13 ± t0.025 s 1 + + = 123.6135 ± 2.228 × 6.7835 1 + + = [105.8668; 141.3602].
12 ∑12 t=1 (t − t̄)
2 12 143
Hence, the error bound for predicting the 13th observation using a 95% confidence level is:
141.3602 − 105.8668
B13 [95] = = 17.7467.
2
It follows that an approximate 95% prediction interval for y13 is:
ŷ13 − 17.7467; ŷ13 + 17.7467] = [117.4328 − 17.7467; 117.4328 + 17.7467] = [99.6861; 135.1795].
EXERCISE
Consider the average number of calls received per day at a Computer Club Warehouse (CCW) call centre for the past three
years. You will also realise that the pattern of the call volumes can be of help in the analysis.
(a) Plot the call volume versus time.
(b) What type of trend and seasonal variation do you observe? Explain your answer.
(c) Perform a time series analysis using the multiplicative decomposition, that is estimate the four time series components.
(d) Calculate the point forecast and the 95% prediction interval for the call volume in the first quarter of the fourth year. The
next table presents the data.
Year Quarter Call volume

1 1 6809
1 2 6465
1 3 6569
1 4 8266
2 1 7257
2 2 7064
2 3 7784
2 4 8724
3 1 6992
3 2 6822
3 3 7949
3 4 9650
If you are unsure whether your answers are correct, discuss them with your fellow students in the Discussion forum on the
module website.
99
4.3 Additive decomposition
A time series that exhibits constant seasonal variation is modelled using an additive decomposition model given by:
yt = T Rt + SNt + CLt + IRt
with the known notation. The estimation of the four components follows the steps used for a multiplicative decomposition
model. The only differences are the following:
(1) The multiplication sign is replaced by the addition sign.
(2) The division sign is replaced by the substraction sign.
(3) The sum of the adjusted seasonal indices is zero, not the number of seasons.
ACTIVITY 4.7
Consider the following time series data (not real life data, but fictional).
(a) Plot the sales versus time.
(b) What type of trend and seasonal variation do you observe? Explain your answer.
(c) Perform a time series analysis using the additive decomposition, that is estimate the four time series components.
(d) Calculate the point forecast and the 95% prediction interval for the sales in Spring 2022. The next table presents the data.
Year Season Sales

2019 Spring 142
Summer 54
Autumn 162
Winter 206
2020 Spring 130

Summer 50
Autumn 174
Winter 198
2021 Spring 126

Summer 42
Autumn 162
Winter 186
Attempt the activity as an exercise following the same procedure as in Activities 4.5 and 4.6.
100
4.4 Conclusion
This unit dealt with two methods for the decomposition of time series. The estimation of the four components was done using
moving averages. Two examples were used to illustrate component estimation, point and interval prediction for the multiplicative
decomposition. The additive decomposition was discussed briefly since the component estimation and predictions follow the same
steps as the one used in the multiplicative decomposition. We are now ready for the last unit of the module.
101
Unit 5
Exponential smoothing
Outcome table for the study unit.

This study unit focuses on exponential smoothing methods. The outcomes of the unit are:
• Apply simple exponential smoothing, tracking signals, Holt’s trend-corrected exponential smoothing, the Holt-Winters
model and the damped trend model.
• Fit each model to available data and describe its appropriateness for a given situation.
• Track and correct identifiable errors in the model used.
102
Outcomes - At the
end of the unit you Assessment Content Activities Feedback
should be able to
- explain methods of - exponential
- analyse data - perform - discuss
smoothing - smoothing
appropriate likely
constants
calculations errors
- damped trend
- perform simple - explore data - perform
- simple - explain
exponential with various calculations
exponential alternative
smoothing smoothing - interpret
soothing methods
constants data
- tracking signals - calculate the - discuss the
- monitor the - measure the
statistics solutions
forecasting system strength of
- interpret
forecasts
them
- describe and - Holt’s trend
- determine - perform apt - discuss the
apply various corrected
aptness of calculations solutions
smoothing smoothing
various for each
approaches - Holt-Winters
methods method
method
- Damped
trend
method
- Holt’s, Holt- - discuss the
- forecast future - develop - perform
Winters and solutions
values of a forecast calculations
damped trend
time series values
5.1 Introduction
Changing trend and seasonality of a time series over time makes forecasting difficult to undertake. This is when exponential
smoothing becomes useful. Smoothing constants are used to smooth a rough time series. In this module we study various smoothing
methods, and a tracking method to monitor the process. The methods are simple exponential, Holt’s trend corrected exponential,
Holt-Winters, and damped trend exponential.
A common way to characterise exponential smoothing is to consider it as a technique that can be applied to time series data,
either to produce smoothed data that are to be presented, or to develop forecasts. The observed phenomenon may be an essentially
random process, or it may be an orderly, but noisy, process. Different smoothing techniques are available as presented in this unit,
and each one for a specific purpose. For example, simple moving average is one in which the past observations are weighted equally,
and exponential smoothing is one which assigns higher weights to recent observed and lower weights to remote observations.
Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of
repeated measurements.
103
5.2 Simple exponential smoothing
This method is used when data pattern is horizontal (i.e., there is neither cyclic variation nor trend in the historical data). Let us
first explore the following model.
The model
yt = β0 + εt .
is used for forecasting when there is no trend or seasonal pattern and the mean of the time series remains constant.
In some statistics textbooks, this equation is commonly written as yt = µ + εt .
The least squares point estimates of the mean β0 is b0 = y, where:

1 n
y= ∑ yt .
n i=1
Do you remember this formula? Equal weights are given to each observation as
1 n n 1
y= ∑ yt = ∑ yt .
n i=1 i=1 n
Under these conditions, we require a model that would describe the data more suitably, and estimates for the mean that may
change from one time period to the next. Simple exponential smoothing (SES) is one such method; it does not use equal weights.
Instead, more recent observations are given more weight.
ACTIVITY 5.1
Indicate “True” or “False” for each the following statements about SES. In case of “False”, correct the statement. Justify the
correct statements.
(1) Estimate of the mean is constant.
(2) Estimate of the mean changes over time.
(3) Oldest observations receive the most weights.
(4) Newest observations receive the average weight.
(1) Estimate of the mean is constant.

The statement is false. SES is used because of its changing mean to suit the no trend model that has a mean that changes
slowly over time.
(2) Estimate of the mean changes over time.

True. The nature (or formulation) of SES is such that it caters for a mean that changes over time.
104
(3) Oldest observations receive the most weights.
False. Oldest observations receive the least weights.
(4) Newest observations receive the average weight.

False. They receive largest weights.
We release you from suspense and define SES formally. Let y1 , y2 , ..., yn be a time series with a mean that is changing slowly
over time but having neither a trend nor seasonal pattern. Then the estimate for the level (or mean) of the time series in period T is:
ℓT = αyT + (1 − α) ℓT −1
where
α= smoothing constant between 0 and 1, and
ℓT −1 = estimate of the level (or mean) of the time series at time T − 1.
The value of α determines the degree of smoothing and how responsive the model is to fluctuation in the time-series data. This
value is arbitrary and is determined both by the nature of the data and the sensitivity of the forecaster as to what constitutes a good
response rate. A smoothing constant close to zero leads to a stable model while a constant close to one is highly reactive. Typically,
constant values between 0.01 and 0.3 are used.
Let us illustrate with the data we have seen before in order to feel comfortable at the early stage of SES exploration.
ACTIVITY 5.2
Consider the cod catch data that were discussed in Activity 3.5 of Unit 3. Find the smoothing levels at all the time points, then
calculate the forecasts made in last period, the forecast errors and the squared forecast errors.
If you recall, the plots for these data showed that the data had no trend
ℓT = αyT + (1 − α) ℓT −1 , we need an initial smoothing level l0 . To our knowledge, there is no consensus in the statistical liter-
ature about the initial values and thus many statistical packages use different values. In Excel, l0 = y1 , that is the first observation.
For the cod catch data, the author of the prescribed textbook chose the mean of the observations in the first year as the value of l0 .
That is:
1 12 1
ℓ0 = ∑ yt = (362 + 381 + 317 + ... + 343)
12 t=1 12
1
= (4329) = 360.6667
12
In order to illustrate, we use α = 0.1. Does it satisfy the given restriction? We want to explore by determining levels from these
data.
105
ℓ1 = αy1 + (1 − α) ℓ0
= (0.1) (362) + (0.9) (360.6667)
= 360.8000
ℓ2 = αy2 + (1 − α) ℓ1
= (0.1) (381) + (0.9) (360.8000)
= 362.8200
These can be calculated further to ℓ24 . Forecast errors can be calculated for all these mean levels. Do you remember the forecast
errors? Detailed results, including the estimates of forecasts made last period, are in the following table.
Time (t) Actual Smoothed Forecasts made Forecast Squared forecast

values (yt ) estimates (lt ) last period (Ft ) errors (et ) errors (e2t )
0 360.6667 - - -
1 362 360.8 360.6667 1.3333 1.777689
2 381 362.82 360.8 20.19997 408.0388
3 317 358.238 362.82 -45.82 2099.475
4 297 352.1142 358.238 -61.238 3750.096
5 399 356.8028 352.1142 46.88578 2198.276
6 402 361.3225 356.8028 45.1972 2042.787
7 375 362.6903 361.3225 13.67748 187.0735
8 349 361.3212 362.6903 -13.6903 187.4234
9 386 363.7891 361.3212 24.67876 609.0411
10 328 360.2102 363.7891 -35.7891 1280.861
11 389 363.0892 360.2102 28.78979 828.8523
12 343 361.0803 363.0892 -20.0892 403.5753
13 276 352.5722 361.0803 -85.0803 7238.652
14 334 350.715 352.5722 -18.5722 344.9281
15 394 355.0435 350.715 43.28498 1873.59
16 334 352.9392 355.0435 -21.0435 442.8295
17 384 356.0452 352.9392 31.06084 964.7756
18 314 351.8407 356.0452 -42.0452 1767.803
19 344 351.0566 351.8407 -7.84072 61.47692
20 337 349.651 351.0566 -14.0566 197.5894
21 345 349.1859 349.651 -4.65098 21.63166
22 362 350.4673 349.1859 12.81411 164.2015
23 314 346.8206 350.4673 -36.4673 1329.864
24 365 348.6385 346.8206 18.17943 330.4918
Total 28735.11
106
In SES, a point forecast at time T of any future value yT +τ is the last estimate ℓT for the mean of the time series since there is
no trend or seasonal pattern to observe. That is:
ŷT +τ = ℓT (τ = 1, 2, 3, ...)
ACTIVITY 5.3
Write down the point forecast in time period t − 1 of the value yt−1 .
Because of no trend and no seasonal pattern we have ŷ (t − 1) = ŷt−1 = ℓt−1 .
We dealt with the standard error(s) and the sum of squares for error (SSE) in the earlier chapters. The current version is that the
standard error at time T is:
¿
ÁT
√ Á ∑ (yt − ℓt−1 )2
SSE Á Á
À t=1
s= =
T −1 T −1
For any τ , a 95% prediction interval computed in time period T for yT +τ is:
√ √
[ℓT − z[0.025] s 1 + (τ − 1) α2 ; ℓT + z[0.025] s 1 + (τ − 1) α2 ]
ACTIVITY 5.4
Write down the formula for a 95% prediction interval computed in time period T for yT +τ when:
(i) τ = 1
(ii) τ = 2.
The solutions are found by substitution. That is:
(i) For τ = 1, the 95% prediction interval for yT +1 is:
[lT − z[0.025] s; lT + z[0.025] s].
√ √
(ii) For τ = 2, the prediction interval for yT +2 is: [ℓT − z[0.025] s 1 + α2 ; ℓT + z[0.025] s 1 + α2 ].
We can go on and experiment with different values of τ , but it is a theoretical exercise more than it is an application. For this
reason, let us consider a practical example by revisiting the Cod Catch data discussed in Unit 3, Activity 3.5.
107
ACTIVITY 5.5
Consider cod catch data that were discussed in Unit 3. Find the point prediction and the 95% prediction intervals in months
made in month 24 for the months 25, 26 and 27.
In activity 5.2, we arbitrarily used α = 0.2 as the smoothing constant. However, using Solver in Excel gives 0.034 as the optimal
value of α that minimizes the error sum of squares (SSE). With α = 0.034, the optimal values of lT , Ft , et = yT − lT , e2t and SSE
are given in the following table.
Time (t) Actual Smoothed Forecasts made Forecast Squared forecast

values (yt ) estimates (lt ) last period (Ft ) errors (et ) errors (e2t )
0 - 360.6667 - - -
1 362 360.7120 360.6667 1.3333 1.7777
2 381 361.4018 360.7120 20.2880 411.6016
3 317 359.8922 361.4018 -44.4018 1971.5219
4 297 357.7538 359.8922 -62.8922 3955.4239
5 399 359.1562 357.7538 41.2462 1701.2467
6 402 360.6129 359.1562 42.8438 1835.5914
7 375 361.1020 360.6129 14.3871 206.9890
8 349 360.6906 361.1020 -12.1020 146.4596
9 386 361.5511 360.6906 25.3094 640.5668
10 328 360.4104 361.5511 -33.5511 1125.6763
11 389 361.3824 360.4104 28.5896 817.3674
12 343 360.7574 361.3824 -18.3824 337.9130
13 276 357.8757 360.7574 -84.7574 7183.8182
14 334 357.0639 357.8757 -23.8757 570.0469
15 394 358.3197 357.0639 36.9361 1364.2767
16 334 357.4928 358.3197 -24.3197 591.4484
17 384 358.3941 357.4928 26.5072 702.6295
18 314 356.8847 358.3941 -44.3941 1970.8348
19 344 356.4466 356.8847 -12.8847 166.0151
20 337 355.7854 356.4466 -19.4466 378.1705
21 345 355.4187 355.7854 -10.7854 116.3253
22 362 355.6425 355.4187 6.5813 43.3133
23 314 354.2266 355.6425 -41.6425 1734.0962
24 365 354.5929 354.2266 10.7734 116.0654
Total - - - -120.28190 28089.1756
The table indicates that l24 = 354.5929.

Therefore, for ŷ25 and other future monthly cod catches,
ŷ24+τ = ℓ24 = 354.5929..
108
For prediction intervals we need the value of the standard error. The standard error is:
¿
ÁT
√ Á ∑ (yt − ℓt−1 )2
SSE Á Á
À t=1
√
s= = = 28089.1756
23
= 34.9465.
T −1 T −1
(1) A 95% prediction interval made in month 24 for Y25 is:
[l24 ± z0.025 s] = 354.5929 ± 1.96(34.9465) = [286.0978; 423.0880].
(2) 95% prediction interval made in month 24 for Y26 is:

√ √
[l24 ± z0.025 s 1 + α2 ] = 354.5929 ± 1.96(34.9465) 1 + 0.034 = [286.0582; 423.1276].
(3) 95% prediction interval made in month 24 for Y27 is:

√ √
[l24 ± z0.025 s 1 + 2α2 ] = 354.5929 ± 1.96(34.9465) 1 + 2 × 0.034 = [286.0186; 423.1672].
The information obtained from this activity is that when α is small the limits and, therefore lengths, of the prediction for future
values are practically the same.
ACTIVITY 5.6
Write down the model ℓT = αyT + (1 − α) ℓT −1 in terms of ℓT −1 and yT − ℓT −1 , then give an interpretation.
ℓT = αyT + (1 − α) ℓT −1
= αyT + ℓT −1 − αℓT −1
= ℓT −1 + αyT − αℓT −1
= ℓT −1 + α (yT − ℓT −1 )
This form is called the error correction form. It means that the smoothing level lT at time T is the sum of the smoothing level
lT −1 at the previous time, T − 1, plus a fraction α of the one-period-ahead forecast error yT − lT −1 .
5.3 Tracking signals

Sometimes changing a smoothing constant in a SES may improve the forecasts. A tracking signal will be used to decide when
something is wrong with a forecasting system, such as when an inappropriate smoothing constant is used. A forecasting system is
not expected to produce perfect forecasts, but if the forecasts deviate more than it is acceptable, a tracking signal can tell us so.
A tracking signal is thus an indicator to inform us that things are right or wrong. It is a technique to monitor the performance
of our forecasting procedures. Basically, it indicates if the forecast is consistently biased high or low. The description of a tracking
signal is as follows:
Let e1 (α) , e2 (α) , ..., eT (α) be T single-period-ahead forecast errors, where (α) denotes the particular value of α employed
to obtain the single-period-ahead forecast errors. The sum of forecast errors is defined as:
109
T
Y (α, T ) = ∑ et (α).
t=1
ACTIVITY 5.7
Determine the sum of forecast errors for T = 24 using the cod catch data.
By definition, a single-period-ahead forecast is given by eT (α) = yT − lT −1 . Hence, the forecast errors indicted in the table
under Activity 5.4 are in the column with heading “Forecast errors”. Their sum is −120.28190.
Forecast errors are in column E.
ACTIVITY 5.8
Show that:
Y (α, T ) = Y (α, T − 1) + eT (α) .
Using the definition appropriately, we have:

T T −1
Y (α, T ) = ∑ et (α) = ∑ et (α) + eT (α)
t=1 t=1
= Y (α, T − 1) + eT (α)
One of the tracking signal instruments is the simple simple cusum (cumulative sum) tracking signal C(α, T ) defined by
Y (α, T )
C(α, T ) = ∣ ∣
M AD(α, T )
where M (α, T ) = α∣eT (α)∣ + (1 − α)M AD(α, T − 1) is the smoothed mean absolute deviation (M AD).
Remember that:
n
∑ ∣et ∣
t=1
M AD =
n
We update it according to the current form. That is:

T
∑ ∣et (α)∣
t=1
M AD (α, T ) =
T
110
If C (α, T ) is large, then the sum of forecast errors Y (α, T ) is large relative to the mean absolute deviation M AD (α, T ).
This means that the forecasting system produces errors that are either consistently positive or consistently negative. This means
that a large C (α, T ) value shows that the forecasting system produces forecasts that are consistently smaller or consistently larger
than the actual time series value. If the forecasting system is accurate, it should produce (at least approximately) an equal number
of negative and positive errors. Thus, a large C (α, T ) indicates that the forecasting system does not perform accurately. Note that
we have still have not quantified what a large value of C (α, T ) means. There are no hard and fast rules for it. It will be given with
every situation. However, there is a rule of thumb based on a predefined control limit K. If C(α, T ) exceeds K for two or more
consecutive periods, then this is an indication the forecasts errors are larger the ones expected for an accurate forecasting system.
The following table gives the control limit K for selected smoothing constants where for 5% and 1% chance of having larger than
normal value of C(α, T ).
5% 1%
α 0.1 0.2 0.3 0.1 0.2 0.3
K 5.6 4.1 3.5 7.5 5.6 4.9
ACTIVITY 5.9
Consider the Cod Catch data discussed in Unit 3 and in the above activities.
(a) Calculate the simple cusum tracking signal for all the 24 time points.
(b) Is the forecasting appropriate for the data? Explain your answer. Use both 5% and 1% as the significance levels.
The important quantities needed to calculate values of the simple cusum tracking are given in the following table:
111
T
t yt lt Ft et ∣et ∣ Y (α, T ) ∑ ∣et ∣ M ADT M AD(α, T ) C(α, T )
t=1
0 360.67
1 362 360.71 360.67 1.33 1.33 1.33 1.33 1.33 1.33 1.00
2 381 361.40 360.71 20.29 20.29 21.62 21.62 10.81 3.23 6.70
3 317 359.89 361.40 -44.40 44.40 -22.78 66.02 22.01 7.35 3.10
4 297 357.75 359.89 -62.89 62.89 -85.67 128.92 32.23 12.90 6.64
5 399 359.16 357.75 41.25 41.25 -44.43 170.16 34.03 15.74 2.82
6 402 360.61 359.16 42.84 42.84 -1.58 213.01 35.50 18.45 0.09
7 375 361.10 360.61 14.39 14.39 12.80 227.39 32.48 18.04 0.71
8 349 360.69 361.10 -12.10 12.10 0.70 239.49 29.94 17.45 0.04
9 386 361.55 360.69 25.31 25.31 26.01 264.80 29.42 18.23 1.43
10 328 360.41 361.55 -33.55 33.55 -7.54 298.35 29.84 19.76 0.38
11 389 361.38 360.41 28.59 28.59 21.05 326.94 29.72 20.65 1.02
12 343 360.76 361.38 -18.38 18.38 2.67 345.33 28.78 20.42 0.13
13 276 357.88 360.76 -84.76 84.76 -82.09 430.08 33.08 26.85 3.06
14 334 357.06 357.88 -23.88 23.88 -105.97 453.96 32.43 26.56 3.99
15 394 358.32 357.06 36.94 36.94 -69.03 490.90 32.73 27.59 2.50
16 334 357.49 358.32 -24.32 24.32 -93.35 515.22 32.20 27.27 3.42
17 384 358.39 357.49 26.51 26.51 -66.84 541.72 31.87 27.19 2.46
18 314 356.88 358.39 -44.39 44.39 -111.24 586.12 32.56 28.91 3.85
19 344 356.45 356.88 -12.88 12.88 -124.12 599.00 31.53 27.31 4.55
20 337 355.79 356.45 -19.45 19.45 -143.57 618.45 30.92 26.52 5.41
21 345 355.42 355.79 -10.79 10.79 -154.35 629.23 29.96 24.95 6.19
22 362 355.64 355.42 6.58 6.58 -147.77 635.82 28.90 23.11 6.39
23 314 354.23 355.64 -41.64 41.64 -189.41 677.46 29.45 24.97 7.59
24 365 354.59 354.23 10.77 10.77 -178.64 688.23 28.68 23.55 7.59
The unusual headings in the table above are described as follows:
Y (α, T ) = ∑Tt=1 et (α) for t = 1, 2 . . . , 24
∑t=1 ∣et ∣
T
M ADT = T
for t = 1, 2 . . . , 24
M AD(α, T ) = 0.1∣eT ∣ + 0.9M ADT
(α,T )
C(α, T ) = ∣ MYAD(α,T )
∣
For a smoothing constant α = 0.1, the control limits at 5% and 1% levels of significance are respectively K = 5.6 and K = 7.5.
The simple cusum tracking signals greater than K = 5.6 correspond to the observations at times t = 2, 4, 20, 21, 22, 23, 24; that is
seven observations. The simple cusum tracking signals greater than for K = 7.5 correspond to the observations at times t = 23, 24,
but the difference is very small. Since C(α, T ) > K in two or more than two time periods, we conclude that the forecasting process
is not accurate for both the 5% and the 1% levels of significance. However, the deviation from accuracy is less severe at the 1%
level of significance.
Another tracking signal that had an extensive use in practice is the smoothed error tracking signal defined as follows:
First define the smoothed error (E) of the one-period-ahead forecast errors as:
112
E(α, T ) = αeT (α) + (1 − α)E(α, T − 1).
Then, the smoother error tracking signal is defined as:
E(α, T )
S(α, T ) = ∣ ∣.
M AD(α, T )
ACTIVITY 5.10
Consider the Cod Catch data discussed in Unit 3 and in the above activities.
(a) Calculate the smoothed error tracking signal for all the 24 time points.
(b) Is the forecasting appropriate for the data? Explain your answer. Use both5% and 1% as the significance levels.
The important quantities needed to calculate values of the smoothed error tracking signal are given in the following table:
T
t yt lt Ft et et Y (α, T ) ∑ ∣et ∣ M ADT M AD C E S
t=1
0 360.67
1 362 360.71 360.67 1.33 1.33 1.33 1.33 1.33 1.33 1.00 1.33 1.00
2 381 361.40 360.71 20.29 20.29 21.62 21.62 10.81 3.23 6.70 3.23 1.00
3 317 359.89 361.40 -44.40 44.40 -22.78 66.02 22.01 7.35 3.10 -1.53 0.21
4 297 357.75 359.89 -62.89 62.89 -85.67 128.92 32.23 12.90 6.64 -7.67 0.59
5 399 359.16 357.75 41.25 41.25 -44.43 170.16 34.03 15.74 2.82 -2.78 0.18
6 402 360.61 359.16 42.84 42.84 -1.58 213.01 35.50 18.45 0.09 1.78 0.10
7 375 361.10 360.61 14.39 14.39 12.80 227.39 32.48 18.04 0.71 3.04 0.17
8 349 360.69 361.10 -12.10 12.10 0.70 239.49 29.94 17.45 0.04 1.53 0.09
9 386 361.55 360.69 25.31 25.31 26.01 264.80 29.42 18.23 1.43 3.91 0.21
10 328 360.41 361.55 -33.55 33.55 -7.54 298.35 29.84 19.76 0.38 0.16 0.01
11 389 361.38 360.41 28.59 28.59 21.05 326.94 29.72 20.65 1.02 3.00 0.15
12 343 360.76 361.38 -18.38 18.38 2.67 345.33 28.78 20.42 0.13 0.87 0.04
13 276 357.88 360.76 -84.76 84.76 -82.09 430.08 33.08 26.85 3.06 -7.70 0.29
14 334 357.06 357.88 -23.88 23.88 -105.97 453.96 32.43 26.56 3.99 -9.31 0.35
15 394 358.32 357.06 36.94 36.94 -69.03 490.90 32.73 27.59 2.50 -4.69 0.17
16 334 357.49 358.32 -24.32 24.32 -93.35 515.22 32.20 27.27 3.42 -6.65 0.24
17 384 358.39 357.49 26.51 26.51 -66.84 541.72 31.87 27.19 2.46 -3.34 0.12
18 314 356.88 358.39 -44.39 44.39 -111.24 586.12 32.56 28.91 3.85 -7.44 0.26
19 344 356.45 356.88 -12.88 12.88 -124.12 599.00 31.53 27.31 4.55 -7.99 0.29
20 337 355.79 356.45 -19.45 19.45 -143.57 618.45 30.92 26.52 5.41 -9.13 0.34
21 345 355.42 355.79 -10.79 10.79 -154.35 629.23 29.96 24.95 6.19 -9.30 0.37
22 362 355.64 355.42 6.58 6.58 -147.77 635.82 28.90 23.11 6.39 -7.71 0.33
23 314 354.23 355.64 -41.64 41.64 -189.41 677.46 29.45 24.97 7.59 -11.10 0.44
24 365 354.59 354.23 10.77 10.77 -178.64 688.23 28.68 23.55 7.59 -8.92 0.38
In this table, C, E, S stand for C(α, T ), E(α, T ) and S(α, T ), respectively.

For a smoothing constant α = 0.1, the control limits at 5% and 1% levels of significance are respectively K = 5.6 and K = 7.5.
Since S(α, T ) is less than K = 5.6 and K = 7.5 in all time periods, we conclude that the forecasting process is accurate for both
the 5% and the 1% levels of significance.
113
5.4 Holt’s trend corrected exponential smoothing
SES cannot handle a time series that displays a trend. If the time times is increasing or decreasing at a fixed rate it may be described
by the linear trend model:
yt = β0 + β1 t + εt .
The level (or mean) at time T is β0 + β1 t and that at time T − 1 is β0 + β1 (T − 1).
ACTIVITY 5.11
Show that the change in level of the time series from time period T − 1 to time period T is β1 .
The change is simply the difference the trend value at time T and the trend value at time T − 1. That is:
[β0 + β1 T ] − [β0 + β1 (T − 1)] = β0 + β1 T − β0 − β1 T − (−β1 ) = β1 .
Growth rate
Now, regardless of whether the change β1 leads to an increase or a decrease, it is called the growth rate.
Holt’s trend corrected exponential smoothing is appropriate when both the level and the growth rate are changing. For the
Holt’s trend corrected exponential smoothing, let ℓT −1 be the estimate of the level of the time series in time period T − 1 and bT −1
be the corresponding estimate of the growth rate. If we observe a new time series value yt in time period T , these two estimates
require two smoothing equations to be updated.
The estimate of the level of the time series in time period T uses the smoothing constant α and is:
ℓT = αyT + (1 − α) (ℓT −1 + bT −1 )
The estimate of the growth rate of the time series in time period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
A point forecast made in time period T for yT +τ is:
ŷT +τ = ℓT + τ bT (τ = 1, 2, 3, ...)
The standard error s is given as:

¿
ÁT
√ Á ∑ [yt − (ℓT −1 + bT −1 )]2
SSE Á Á
À t=1
s= =
T −2 T −2
If τ = 1, then a 95% prediction interval computed in time period T for yT +τ is:
[(ℓT + bT ) − z0.025 s, (ℓT + bT ) + z0.025 s]
In general, for τ ≥ 2, a 95% prediction interval computed in time period T for yT +τ is:
114
¿ ¿
Á τ −1 Á τ −1
⎡ ⎤
⎢ ⎥
À1 + ∑ α2 (1 + jγ);
⎢(ℓT + τ bT ) − z0.025 sÁ À1 + ∑ α2 (1 + jγ)⎥
(ℓT + τ bT ) + z0.025 sÁ
⎢ ⎥
⎢ j=1 j=1 ⎥
⎣ ⎦
ACTIVITY 5.12
Write down the formula for a 95% prediction interval computed in time period T for yT +τ when:
(i) τ = 2
(ii) τ = 3.
(i) For τ = 2, the above general formula for the 95% prediction interval for yT +2 can be written as:
√ √ √
lT + 2bT ± z0.025 s 1 + α2 (1 + γ)2 = [lT + 2bT − z0.025 s 1 + α2 (1 + γ)2 ; lt + 2bT + z0.025 s 1 + α2 (1 + γ)2 ].
(2) For τ = 3, the general formula for the 95% prediction interval for yT +2 can be written as:
√
lT + 3bT ± z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 ,
which, in turn, can be written as:

√ √
[lT + 3bT − z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 ; lt + 3bT + z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 ].
ACTIVITY 5.13
Show that the Holt’s trend corrected exponential smoothing equations can be written in the following error correction forms:
lT = lT −1 + bT −1 + α[yT − (lT −1 + bT −1 )]
bT = bT −1 + α[yT − (lT −1 + bT −1 ).
The first equation is found as follows:
lT = αyT + (1 − α)[lT −1 + bT −1 ]
= lT −1 + bT −1 + αyT − α[lT −1 + bT −1 ]
= lT −1 + bT −1 + α[yT − (lT −1 + bT −1 )].
The establishment of the second equation is left to you as an exercise.
ACTIVITY 5.14
The following data can be found in Example 8.3 in the prescribed textbook. The data is about thermostats sold for a period of
52 weeks.
115
206 189 172 255
245 244 210 303
185 209 205 282
169 207 244 291
162 211 218 280
177 210 182 255
207 173 206 312
216 194 211 296
193 234 273 307
230 156 248 281
212 206 262 308
192 188 258 280
162 162 233 345
The objective of the study was to find the point forecast and the 95% predictions intervals of the number of thermostats to be
sold in weeks 53, 54 and 55.
The plot sales versus time (here week) is the following:
The graph indicates an overall increasing trend mainly at the end, but the growth rate has been changing over the 52 weeks.
Therefore, the Holt’s trend corrected exponential smoothing can be used to analyse the data. It has been suggested by the author
of the textbook to use α = 0.2 and γ = 0.1 as smoothing constants. We must first find l0 = β̂0 and b0 = β̂1 by fitting on the first
26 observations, the model yt = β0 + β1 t + ϵt where yt represents sakes at time t, t = 1, 2, . . . , 52 and ϵt is the error term at time t,
assumed to have mean 0.
116
Least squares estimation using Excel gives the following output:
SUMMARY OUTPUT
Multiple R 0.1118
R Square 0.0125
Adjusted R Square -0.0287
Observations 26
ANOVA
Regression 1 198.2785 198.2785 0.3036 0.5867
Residual 24 15673.6062 653.0669
Total 25 15871.8846
Intercept 202.6246 10.3199 19.6344 0.0000 181.3254 223.9238
t -0.3682 0.6682 -0.5510 0.5867 -1.7474 1.0110
The fitted model is: ŷt = 202.6246 − 0.3682t. This model shows that, in general, the trend is decreasing. It follows that
l0 = β̂0 = 202.6246 and b0 = β̂1 = −0.3682. We can now calculate the smoothing levels lt and the growth rates bt using the
equations
ℓT = αyT + (1 − α) (ℓT −1 + bT −1 )
and
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
where α = 0.2, γ = 0.1, l0 = 202.6246 and b0 = −0.3682. For instance, l1 , b1 , l2 and b2 are calculated as follows:
l1 = αy1 + (1 − α)(l0 + b0 ) = 0.2(206) + 0.8(202.6246 − 0.3682) = 203.0051.
b1 = γ(l1 − l0 ) + (1 − γ)b0 = 0.1(203.0051 − 202.6246) + 0.9(−0.3682) = −0.2933.
l2 = αy2 + (1 − α)(l1 + b1 ) = 0.2(245) + 0.8(211.1694 − 0.2933) = 211.1694.
b2 = γ(l2 − l1 ) + (1 − γ)b1 = 0.1(211.1694 − 203.0051) + 0.9(−0.2933) = 0.55246.
Continuing the process until t = 52 gives the values in columns 3 and 4 in the following table:
117
Week Sales Level Growth rate Forecast made last period Forecast error Squared forecast error
0 - 202.6246 -0.3682 - - -
1 206 203.0051 -0.2933 202.2564 3.7436 14.0145
2 245 211.1694 0.5524 202.7118 42.2882 1788.2925
3 185 206.3775 0.0180 211.7219 -26.7219 714.0583
4 169 198.9164 -0.7299 206.3955 -37.3955 1398.4230
5 162 190.9492 -1.4536 198.1865 -36.1865 1309.4617
6 177 186.9964 -1.7036 189.4955 -12.4955 156.1387
7 207 189.6343 -1.2694 185.2929 21.7071 471.1988
8 216 193.8919 -0.7167 188.3649 27.6351 763.6988
9 193 193.1402 -0.7202 193.1752 -0.1752 0.0307
10 230 199.9360 0.0314 192.4200 37.5800 1412.2596
11 212 202.3739 0.2720 199.9674 12.0326 144.7845
12 192 200.5167 0.0591 202.6459 -10.6459 113.3357
13 162 192.8607 -0.7124 200.5759 -38.5759 1488.0973
14 189 191.5186 -0.7754 192.1483 -3.1483 9.9118
15 244 201.3946 0.2898 190.7433 53.2567 2836.2784
16 209 203.1475 0.4361 201.6844 7.3156 53.5180
17 207 204.2669 0.5044 203.5836 3.4164 11.6718
18 211 206.0170 0.6290 204.7713 6.2287 38.7967
19 210 207.3168 0.6961 206.6460 3.3540 11.2491
20 173 201.0103 -0.0042 208.0129 -35.0129 1225.9025
21 194 199.6049 -0.1443 201.0061 -7.0061 49.0858
22 234 206.3685 0.5465 199.4606 34.5394 1192.9711
23 156 196.7320 -0.4718 206.9149 -50.9149 2592.3316
24 206 198.2081 -0.2770 196.2601 9.7399 94.8650
25 188 195.9449 -0.4756 197.9311 -9.9311 98.6264
26 162 188.7754 -1.1450 195.4692 -33.4692 1120.1885
27 172 184.5043 -1.4576 187.6303 -15.6303 244.3076
28 210 188.4373 -0.9186 183.0466 26.9534 726.4838
29 205 191.0150 -0.5689 187.5187 17.4813 305.5945
30 244 201.1568 0.5021 190.4460 53.5540 2868.0261
31 218 204.9272 0.8290 201.6590 16.3410 267.0293
32 182 201.0049 0.3538 205.7561 -23.7561 564.3537
33 206 202.2870 0.4467 201.3587 4.6413 21.5413
34 211 204.3869 0.6120 202.7336 8.2664 68.3326
35 273 218.5991 1.9720 204.9989 68.0011 4624.1497
36 248 226.0569 2.5206 220.5711 27.4289 752.3432
37 262 235.2620 3.1890 228.5775 33.4225 1117.0646
38 258 242.3608 3.5800 238.4510 19.5490 382.1626
39 233 243.3527 3.3212 245.9408 -12.9408 167.4651
40 255 248.3391 3.4877 246.6739 8.3261 69.3246
41 303 262.0614 4.5112 251.8268 51.1732 2618.6956
42 282 269.6581 4.8197 266.5726 15.4274 238.0038
43 291 277.7823 5.1502 274.4778 16.5222 272.9820
44 280 282.3460 5.0915 282.9324 -2.9324 8.5992
45 255 280.9500 4.4428 287.4375 -32.4375 1052.1900
46 312 290.7142 4.9749 285.3928 26.6072 707.9453
47 296 295.7513 4.9811 295.6891 0.3109 0.0966
118
Week Sales Level Growth rate Forecast made last period Forecast error Squared forecast error
48 307 301.9860 5.1065 300.7324 6.2676 39.2823
49 281 301.8740 4.5846 307.0924 -26.0924 680.8155
50 308 306.7669 4.6155 306.4586 1.5414 2.3759
51 280 305.1059 3.9878 311.3823 -31.3823 984.8514
52 345 316.2750 4.7059 309.0937 35.9063 1289.2627
Total 39182.4700
Solver in Excel has been used to find optimal values of α and γ that minimise SSE. The optimal values are α = 0.095. These
new smoothing constants lead to the following table:
Week Sales Level Growth rate Forecast made last period Forecast Error Squared forecast error
0 202.6246 -0.3682
1 206 203.1811 -0.2804 202.2564 3.7436 14.0145
2 245 213.2992 0.7075 202.9007 42.0993 1772.3500
3 185 206.8421 0.0269 214.0067 -29.0067 841.3910
4 169 197.5153 -0.8617 206.8689 -37.8689 1434.0563
5 162 188.0941 -1.6749 196.6536 -34.6536 1200.8702
6 177 184.0927 -1.8959 186.4193 -9.4193 88.7225
7 207 188.3232 -1.3139 182.1968 24.8032 615.1987
8 216 194.1700 -0.6336 187.0093 28.9907 840.4610
9 193 193.4039 -0.6462 193.5364 -0.5364 0.2877
10 230 201.9565 0.2277 192.7577 37.2423 1386.9911
11 212 204.6087 0.4580 202.1842 9.8158 96.3499
12 192 201.8392 0.1514 205.0667 -13.0667 170.7388
13 162 192.1129 -0.7870 201.9906 -39.9906 1599.2500
14 189 190.7514 -0.8416 191.3260 -2.3260 5.4101
15 244 203.2701 0.4277 189.9099 54.0901 2925.7413
16 209 205.0074 0.5521 203.6978 5.3022 28.1134
17 207 205.9153 0.5859 205.5595 1.4405 2.0750
18 211 207.6124 0.6914 206.5012 4.4988 20.2393
19 210 208.7228 0.7312 208.3038 1.6962 2.8770
20 173 200.4499 -0.1242 209.4540 -36.4540 1328.8965
21 194 198.7633 -0.2726 200.3257 -6.3257 40.0149
22 234 207.2615 0.5606 198.4907 35.5093 1260.9109
23 156 195.0221 -0.6554 207.8221 -51.8221 2685.5333
24 206 197.2401 -0.3824 194.3667 11.6333 135.3337
25 188 194.6699 -0.5902 196.8577 -8.8577 78.4594
26 162 186.1560 -1.3430 194.0796 -32.0796 1029.1031
27 172 181.6482 -1.6436 184.8130 -12.8130 164.1725
28 210 187.4134 -0.9398 180.0045 29.9955 899.7281
29 205 191.0496 -0.5051 186.4736 18.5264 343.2270
30 244 203.7480 0.7493 190.5446 53.4554 2857.4848
31 218 207.8325 1.0661 204.4973 13.5027 182.3228
32 182 202.2546 0.4349 208.8986 -26.8986 723.5328
33 206 203.5072 0.5126 202.6895 3.3105 10.9591
119
Week Sales Level Growth rate Forecast made last period Forecast Error Squared forecast error
34 211 205.7439 0.6764 204.0198 6.9802 48.7229
35 273 222.8655 2.2387 206.4203 66.5797 4432.8540
36 248 230.7594 2.7759 225.1042 22.8958 524.2185
37 262 240.5661 3.4439 233.5354 28.4646 810.2345
38 258 247.4655 3.7721 244.0100 13.9900 195.7201
39 233 246.7330 3.3442 251.2377 -18.2377 332.6122
40 255 251.2931 3.4597 250.0771 4.9229 24.2345
41 303 266.6698 4.5918 254.7528 48.2472 2327.7936
42 282 273.9140 4.8438 271.2617 10.7383 115.3118
43 291 281.7816 5.1311 278.7578 12.2422 149.8707
44 280 285.2053 4.9689 286.9127 -6.9127 47.7855
45 255 281.4861 4.1435 290.1741 -35.1741 1237.2185
46 312 292.1431 4.7623 285.6296 26.3704 695.3980
47 296 296.6817 4.7410 296.9054 -0.9054 0.8197
48 307 302.8003 4.8719 301.4228 5.5772 31.1056
49 281 301.0842 4.2460 307.6722 -26.6722 711.4083
50 308 305.9897 4.3087 305.3302 2.6698 7.1277
51 280 302.8147 3.5977 310.2983 -30.2983 917.9895
52 345 315.9435 4.5032 306.4124 38.5876 1489.0045
Total 38884.2466
The point forecasts in week 52 for weeks 53, 54 and 55 obtained using
ŷT +τ = ℓT + τ bT (τ = 1, 2, 3, ...)
are respectively:
ŷ53 (52) = l52 + 1b52 = 315.9435 + 4.5032 = 320.4467.

ŷ54 (52) = l52 + 2b52 = 315.9435 + 2(4.7059) = 324.9499.
ŷ55 (52) = l52 + 3b52 = 315.9435 + 3(4.7059) = 329.4531.
To calculate the 95% prediction intervals, we first have to calculate the standard error; that is:
¿
ÁT
√ Á ∑ [yt − (ℓT −1 + bT −1 )]2
SSE Á Á
À t=1
√
s= = = 38884.2466
50
= 27.8870.
T −2 T −2
Hence, the 95% prediction interval in week 52 for y53 is:
ŷ53 (52) ± z0.025 s = 320.4467 ± 1.96(27.8870) = [265.7882; 375.1052].
Similarly, the 95% prediction interval in week 52 for y54 is:
√ √
ŷ54 (52) ± z0.025 s 1 + α2 (1 + γ)2 = 324.9499 ± 1.96(27.8870) 1 + (0.247)2 (1 + 0.095)2 = [268.3275; 381.5723].
√
Finally, the 95% prediction interval in week 52 for y55 is: ŷ55 (52) ± z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 .
That is:
√
329.4531 ± 1.96(27.8870) 1 + (0.247)2 (1 + 0.095)2 + (0.247)2 (1 + 2(0.095))2 = [266.0740; 392.8322].
120
5.5 Holt-Winters methods
Holt-Winters methods are designed for time series that show linear trend. The trend could be local or over a range of the entire
time series. In this section, two methods are presented. One is the additive Holt-Winters method and the other is the multiplicative
Holt-Winters method.
5.5.1 Additive Holt-Winters method

The additive Holt-Winters method deals with a time series that has a linear trend with a fixed growth rate, β1 , and a fixed seasonal
term, SNt , with constant additive variation in which the time series may be described by the model:
yt = (β0 + β1 t) + SNt + εt
In order to handle this model, it is easier to analyse the trend and the seasonal component separately. The seasonal component
can also be handled using the dummy variables if necessary. This method is appropriate when a time series has a linear trend
with an additive seasonal pattern for which the level, the growth rate, and the seasonal pattern may be changing. Implementation
of the additive Holt-Winters method starts with estimates of the level, the growth rate and the seasonal factor. Let ℓT −1 denote
the estimate of the level in time T − 1, and bT −1 the estimate of the growth rate in time T − 1. Suppose that we observe a new
observation yt in time period T and let snT −L be the latest estimate of the seasonal factor in time period T . As before, L is the
number of seasons. The subscript T − L of snT −L is to reflect that the time series value in time period T − L is the most recent
time series value observed in the season being analysed. Thus, this most recent time series value is used in determining snT −L .
ℓT = α (yT − snT −L ) + (1 − α) (ℓT −1 + bT −1 )
where (yT + snT −L ) is the deseasonalised observation in time period T. The estimate of the growth rate of the time series in
time period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant δ and is:
snT = δ (yT − ℓT ) + (1 − δ) snT −L
where (yT − ℓT ) is an estimate of the newly observed seasonal variation.

ŷT +τ (T ) = ℓT + τ bT + snT +τ −L (τ = 1, 2, 3, ...)
where snT +τ −L is the “most recent” estimate of the seasonal factor for the season corresponding to time period T + τ .
A 95% confidence interval computed in time period T is:
√ √
[ŷT +τ (T ) − z0.025 s cτ ; ŷT +τ (T ) − z0.025 s cτ ]
where
121
cτ =1 for τ = 1
τ −1
2
= 1 + ∑ α2 (1 + jγ) for τ = 2, 3, ..., L
j=1
τ −1
2
= 1 + ∑ [α (1 + jγ) + dj,L (1 − α) δ] for τ = L, L + 1, L + 2, ...
j=1
where
dj,L =1 if j is a multiple of L
=0 otherwise
The standard error s computed in time period T is:

¿ ¿
ÁT 2 ÁT
√ Á ∑ [yt − ŷt (t − 1)] Á ∑ [yt − (ℓt−1 + bt−1 + snT −L )]2
SSE Á Á Á
À t=1 Á
À t=1
s= = =
T −3 T −3 T −3
The error correction forms for the smoothing equations in the additive Holt-Winters method are:
ℓT = ℓT −1 + bT −1 + α [yT − (ℓT −1 + bT −1 + snT −L )]
bT = bT −1 + αγ [yT − (ℓT −1 + bT −1 + snT −L )]
snT = snT −1 + (1 − α) δ [yT − (ℓT −1 + bT −1 + snT −L )]
ACTIVITY 5.15
Suppose that there is a well-known commodity that is transported by the largest international shipping and transportation
company from a foreign country, which is seasonal over the quarters of a year.
(a) Determine the values of cτ .
(b) Using the results in part (a), determine cτ for τ = 1, 2, 3, 4, 5.
(c) Evaluate dj,L when:
(i) j = 2
(ii) j = 12
(a) The number of quarters of a year is L = 4, that is the number of seasons. Hence:
122
cτ =1 for τ = 1
τ −1
2
= 1 + ∑ α2 (1 + jγ) for τ = 2, 3, 4
j=1
τ −1
2
= 1 + ∑ [α (1 + jγ) + dj,4 (1 − α) δ] for τ = 4, 5, 6, ...
j=1
(b) Using the results in part (a), and noting that dj,5 = 0 (since 5 is not a multiple of 4) have:
c1 =1
2
c2 = 1 + ∑1j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ)
2 2
c3 = 1 + ∑2j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ) + α2 (1 + 2γ)
2 2
c4 = 1 + ∑3j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ) + α2 (1 + 3γ)
4
2
c5 = 1 + ∑ [α (1 + jγ) + 0 (1 − α) δ]
j=1
= 1 + α (1 + γ)2 + α2 (1 + 2γ)2 + α2 (1 + 3γ)2 + α2 (1 + 4γ)2
2
= 1 + α2 [(1 + γ)2 + (1 + 2γ)2 + (1 + 3γ)2 + (1 + 4γ)2 ]
(c Just using the definition of dj,L , then:
(i) d2,4 = 0 since 2 is not a multiple of 4.

(ii) d12,4 = 1 since 12 is a multiple of 4.
ACTIVITY 5.16
The following data about four-year quarterly sales of TRK-50 mountain in Switzerland were presented in Exercise 6.4 in the
prescribed textbook. The use of dummy variables was suggested as an analysis method in the exercise.
Year Quarter Time (t) Sales (yt )

1 1 1 10
1 2 2 31
1 3 3 43
1 4 4 16
2 1 5 11
2 2 6 33
2 3 7 45
2 4 8 17
3 1 9 13
3 2 10 34
3 3 11 48
4 4 12 19
4 1 13 15
4 2 14 37
4 3 15 51
4 4 16 21
123
(a) Plot sales versus time where here the time variable varies from 1 to 16.
(b) Explain why the additive Holt-Winters method is appropriate for the data.
(c) Calculate the estimates of the smoothing levels, the growth rates, the seasonal factors, the forecasts made last periods, the
forecast errors, and the squared forecast errors. Use α = 0.2, γ = 0.1 and δ = 0.1 as smoothing constants.
(d) The optimal values of the smoothing constants that minimizes SSE were found to be α = 0.561, γ = 0 and δ = 0. Repeat the
questions in part (c) using these new constants.
(e) Calculate the point forecasts and the 95% prediction interval of sales at the first 3 quarters of year 5.
(a) The plot of sales versus time is the following:
(b) The plot indicates a linear and a constant seasonal variation. Hence, the data can be analysed using the additive Holt-Winters
method.
(c) Fitting the linear trend model yt = β0 + β1 t + ϵt in Excel gives the following output:
SUMMARY OUTPUT
Multiple R 0.2200
R Square 0.0484
Adjusted R Square -0.0196
Observations 16
124
ANOVA
Regression 1 144.9529 144.9529 0.7120 0.4130
Residual 14 2850.0471 203.5748
Total 15 2995

Intercept 22.2000 7.4822 2.9670 0.0102 6.1523 38.2477
Time 0.6529 0.7738 0.8438 0.4130 -1.0067 2.3126
The fitted model is: ŷt = 22.2 + 0.6529t. Therefore, l0 = β̂0 = 22.2 and b0 = β̂1 = 0.6529.
We first calculate the regression estimates using the above fitted model. For instance:
ŷ1 = 22.2 + 0.6529(1) = 22.8529. This value and the other regression estimates, rounded to three digits after the decimal
point, are in the fifth column of the following table. Next, we calculate the detrended values yt − ŷt for t = 1, 2, . . . , 16. For
instance, y1 − ŷ1 = 10 − 22.853 = −12.853. This value and the detrended values are in the sixth column of the following table.
The initial seasonal factors are the averages of the detrended values for the corresponding seasons. In the table, they are
denoted at , and they represent sn1−L = s−3 , sn2−L = sn−2 , sn3−L = sn1 , sn0 since L = 4. As a consequence, the time does
not start from t = 1, but from t = −3 up to t = 16. Let us illustrate how sn−3 was calculated.
(y1 −ŷ1 )+(y2 −ŷ2 )+(y3 −ŷ3 )+(y2 −ŷ4 )

sn−3 = 4
−12.853−14.465−15.076−15.688
= 4
= −14.5205
which becomes −14.521 when rounded to three digits after the decimal point. The value was −14.520 since more digits after
the decimal points were used before rounding the average. This value and the other three are in the seventh column of the
table.
125
Yr Q t yt ŷt yt − ŷt at lt bt snt ŷt+1 (t) et e2t
-3 -14.520
-2 6.327
-1 18.674
0 22.2 0.653 -10.479
1 1 1 10 22.853 -12.853 -14.520 23.186 0.686 -14.387 8.333 1.667 2.780
1 2 2 31 23.506 7.494 6.327 24.033 0.702 6.391 30.199 0.801 0.641
1 3 3 43 24.159 18.841 18.674 24.653 0.694 18.641 43.409 -0.409 0.167
1 4 4 16 24.812 -8.812 -10.479 25.574 0.717 -10.388 14.868 1.132 1.281
2 1 5 11 25.465 -14.465 26.110 0.699 -14.459 11.903 -0.903 0.816
2 2 6 33 26.117 6.883 26.768 0.695 6.375 33.199 -0.199 0.040
2 3 7 45 26.770 18.230 27.242 0.673 18.553 46.104 -1.104 1.220
2 4 8 17 27.423 -10.423 27.810 0.662 -10.431 17.526 -0.526 0.277
3 1 9 13 28.076 -15.076 28.269 0.642 -14.540 14.012 -1.012 1.025
3 2 10 34 28.729 5.271 28.654 0.616 6.272 35.286 -1.286 1.653
3 3 11 48 29.382 18.618 29.305 0.620 18.567 47.823 0.177 0.031
3 4 12 19 30.035 -11.035 29.826 0.610 -10.470 19.494 -0.494 0.244
4 1 13 15 30.688 -15.688 30.257 0.592 -14.612 15.896 -0.896 0.802
4 2 14 37 31.341 5.659 30.824 0.589 6.262 37.121 -0.121 0.015
4 3 15 51 31.994 19.007 31.618 0.610 18.649 49.981 1.019 1.039
4 4 16 21 32.646 -11.646 32.076 0.595 -10.531 21.757 -0.757 0.574
Now, that we have the initial values of the smoothing levels, the growth rates, the seasonal factors and the smoothing constants
we are ready to calculate all the other estimates.
The point forecast at t = 1 is:
ŷ1 (0) = l0 + b0 + sn−3 = 22.2 + 0.653 − 14.520 = 8.333.
The smoothing level at t = 1 is:
l1 = l0 + b0 + α[y1 − (l0 + b0 + sn−3 )]

= l0 + b0 + α[y1 − ŷ1 (0)]
= 22.2 + 0.6529 + 0.2(10 − 8.333)
= 22.2 + 0.653 + 0.2(1.667)
= 23.1864 ≈ 23.186.
The growth rate at t = 1 is:
b1 = b0 + αγ[y1 − (l0 + b0 + sn−3 )]

= b0 + αγ[y1 − ŷ1 (0)]
= 0.653 + 0.2(0.1)(10 − 8.333)
= 0.653 + 0.2(0.1)(1.667)
= 0.6863 ≈ 0.686.
126
The seasonal factor at time t = 1 is:
sn1 = sn−3 + (1 − α)δ[y1 − (l0 + b0 + sn−3 )]
= sn−3 + (1 − α)δ[y1 − ŷ1 (0)]
= −14.520 + 0.8(0.1)(10 − 8.333)
= −14.520 + 0.8(0.1)(1.667)
= −14.38664 ≈ −14.387.
ŷ2 (1) = l1 + b1 + sn−2 = 23.186 + 0.686 + 6.327 = 30.199.
l2 = l1 + b1 + α[y2 − (l1 + b1 + sn−2 )]

= l1 + b1 + α[y2 − ŷ2 (1)]
= 23.186 + 0.686 + 0.2(31 − 30.199)
= 23.186 + 0.686 + 0.2(0.801)
= 24.0322 ≈ 24.032.
b2 = b1 + αγ[y2 − (l1 + b1 + sn−2 )]

= b1 + αγ[y2 − ŷ2 (1)]
= 0.686 + 0.2(0.1)(31 − 30.199)
= 0.653 + 0.2(0.1)(0.801)
= 0.66902 ≈ 0.669.

sn2 = sn−2 + (1 − α)δ[y2 − (l1 + b1 + sn−2 )]
= sn−2 + (1 − α)δ[y2 − ŷ2 (1)]
= 6.327 + 0.8(0.1)(31 − 30.199)
= 6.327 + 0.8(0.1)(0.801)
= 6.39108 ≈ 6.391.
The process continues for t = 3, 4, . . . , 16. The values of lt , bt , snt and Ft = ŷt+1 (t) are in the eighth, ninth, tenth and
eleventh columns, respectively of the table.
The forecast errors et = yt − ŷt+1 (t) were used in the above calculated, but also reported in the twelfth column of the table.
The squared forecast errors are in the last column of the table.
127
(d) The results found in part (c) served for illustrating the computation, but are not useful for prediction since the optimal values
of the smoothing constants that minimises SSE were not used. We now use the optimal values α = 0.561, γ = 0. and δ = 0.
The regression estimates, the detrended values and the initial seasonal factors remain as in part (c). However, the smoothing
levels, the growth rates, the seasonal factors and the point forecast from time t = 1 to t = 16 will change since the smoothing
constants have changed. We obtain the following:
ŷ1 (0) = l0 + b0 + sn−3 = 22.2 + 0.653 − 14.520 = 8.333.
l1 = l0 + b0 + α[y1 − (l0 + b0 + sn−3 )]

= l0 + b0 + α[y1 − ŷ1 (0)]
= 22.2 + 0.6529 + 0.561(10 − 8.333)
= 22.2 + 0.653 + 0.561(1.667)
= 23.788187 ≈ 23.788.
b1 = b0 + αγ[y1 − (l0 + b0 + sn−3 )]

= b0 + αγ[y1 − ŷ1 (0)]
= 0.653 + 0.561(0)(10 − 8.333)
= 0.653.
sn1 = sn−3 + (1 − α)δ[y1 − (l0 + b0 + sn−3 )]

= sn−3 + (1 − α)δ[y1 − ŷ1 (0)]
= −14.520 + 0.439(0)(10 − 8.333)
= −14.520.
The forecast error at time t = 1 is: y1 − ŷ1 (0) = 10 − 8.333 = 1.667 and thus the squared forecast error is 2.780. The process
continues for t = 2, 4, . . . , 16. The results are presented in the following table.
128
Yr Q t yt ŷt yt − ŷt at lt bt snt ŷt+1 (t) et e2t
- - -3 - - - - - - -14.520 - - -
- - -2 - - - - - - 6.327 - - -
- - -1 - - - - - - 18.674 - - -
- - 0 - - - - 22.2 0.653 -10.479 - - -
1 1 1 10 22.853 -12.853 -14.520 23.788 0.653 -14.520 8.333 1.667 2.780
1 2 2 31 23.506 7.494 6.327 24.571 0.653 6.327 30.768 0.232 0.054
1 3 3 43 24.159 18.841 18.674 24.720 0.653 18.674 43.898 -0.898 0.807
1 4 4 16 24.812 -8.812 -10.479 25.994 0.653 -10.479 14.894 1.106 1.223
2 1 5 11 25.465 -14.465 26.015 0.653 -14.520 12.126 -1.126 1.268
2 2 6 33 26.117 6.883 26.671 0.653 6.327 32.994 0.006 0.000
2 3 7 45 26.770 18.230 26.764 0.653 18.674 45.998 -0.998 0.995
2 4 8 17 27.423 -10.423 27.452 0.653 -10.479 16.938 0.062 0.004
3 1 9 13 28.076 -15.076 27.777 0.653 -14.520 13.584 -0.584 0.341
3 2 10 34 28.729 5.271 28.005 0.653 6.327 34.757 -0.757 0.572
3 3 11 48 29.382 18.618 29.033 0.653 18.674 47.332 0.668 0.446
3 4 12 19 30.035 -11.035 29.570 0.653 -10.479 19.207 -0.207 0.043
4 1 13 15 30.688 -15.688 29.829 0.653 -14.520 15.702 -0.702 0.493
4 2 14 37 31.341 5.659 30.589 0.653 6.327 36.808 0.192 0.037
4 3 15 51 31.994 19.007 31.850 0.653 18.674 49.916 1.084 1.175
4 4 16 21 32.646 -11.646 31.929 0.653 -10.479 22.024 -1.024 1.049
Tot - - - - - - - - - - - 11.287
(e) The first three quarters of year 5 correspond to t = 17, 18, 19. The point forecast for y17 at t = 16 is:
ŷ17 (16) = l16 + b16 + sn17−L = l16 + b16 + sn13 = 31.929 + 0.653 − 14.520 = 18.062.
The point forecast fory18 at t = 16 is:
ŷ18 (16) = l16 + 2b16 + sn18−L = l16 + 2b16 + sn14 = 31.929 + 2(0.653) + 6.327 = 39.562.
The point forecast for y19 at t = 16 is:
ŷ19 (16) = l16 + 3b16 + sn19−L = l16 + 3b16 + sn15 = 31.929 + 3(0.653) + 18.674 = 52.562.
We need the standard error before calculating the prediction interval. In this case, we have:
√ √ √
SSE 11.287 11.287
s= = = = 0.9318.
T −3 16 − 3 13
The 95% prediction interval for y17 at t = 16 is:
√ √
ŷ17 (16) ± z0.025 s c1 = 18.062 ± 1.96(0.9318) 1
= 18.062 ± 1.8263
= [16.2357; 19.8883].
129
√ √
ŷ18 (16) ± z0.025 s c2 = 39.562 ± 1.96(0.9318) 1 + α2 (1 + γ)2
√
= 39.562 ± 1.96(0.9318) 1 + (0.561)2 (1 + 0)2
= 39.562 ± 2.0941
= [37.4679; 41.6561].
√ √
ŷ19 (16) ± z0.025 s c3 = 52.562 ± 1.96(0.9318) 1 + α2 (1 + γ)2 + α2 (1 + 2γ)
√
= 52.562 ± 1.96(0.9318) 1 + (0.561)2 (1 + 0)2 + (0.561)2 (1 + 2(0))
√
= 52.562 ± 1.96(0.9318) 1 + 2(0.561)2
= 52.562 ± 2.3313
= [50.2307; 54.8933].
5.5.2 Multiplicative Holt-Winters method

The multiplicative Holt-Winters method is used for a time series that has a linear trend with a fixed growth rate, β1 , and a fixed
seasonal pattern, SNt , with increasing or multiplicative variation. It is appropriate when the level, growth rate and seasonal pattern
may be changing rather than being fixed. This type of time series may be described using the multiplicative model:
yt = (β0 + β1 t) × SNt × IRt
In Unit 4 we showed how to estimate the fixed seasonal factors, SNt , by using centred moving averages. The level at time
period T − 1 for this model is given by β0 + β1 (T − 1), and the level at time period T is given by β0 + β1 T . This shows that a
growth rate for the level is β1 .
The implementation of the multiplicative Holt-Winters method needs to estimate the smoothed level, the growth rate and the
seasonal factor. Let ℓT −1 denote the estimate of the level in time T − 1, and bT −1 denote the growth rate in time T − 1. Then,
suppose that we observe a new observation yT in time period T , and let snT −L be the latest estimate of the seasonal factor in time
period T . As before, L is the number of seasons. The subscript T − L of snT −L is to reflect that the time series value in time period
T − L is the most recent time series value observed in the season being analysed. Thus, this most recent time series value is used
in determining snT −L .
yT
ℓT = α ( ) + (1 − α) (ℓT −1 + bT −1 )
snT −L
yT
where ( ) is the deseasonalised observation in time period T . The estimate of the growth rate of the time series in time
snT −L
period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant δ and is:
yT
snT = δ ( ) + (1 − δ) snT −L
ℓT
130
yT
where ( ) is an estimate of the newly observed seasonal variation.
ℓT
ŷT +τ (T ) = (ℓT + τ bT ) snT +τ −L (τ = 1, 2, 3, ...; )
where snT +τ −L is the “most recent” estimate of the seasonal factor for the season corresponding to time period T + τ .
A 95% prediction interval computed in time period T is:

√ √
[ŷT +τ (T ) − z0.025 sr cτ (snT +τ −L ) ; ŷT +τ (T ) − z0.025 sT cτ (snT +τ −L )]
where
2
c1 = (ℓT + bT )
2 2 2
c2 = α2 (1 + γ) (ℓT + bT ) + (ℓT + 2bT )
2 2 2 2 2
c3 = α2 (1 + 2γ) (ℓT + bT ) + α2 (1 + γ) (ℓT + 2bT ) + (ℓT + 3bT )
−1 2
cτ = ∑τj=1 α (1 + [τ − j]γ)2 (lT + jbT )2 + (lT + τ bT )2 , if 2 ≤ τ ≤ L
The standard error sr computed in time period T is:

¿ ¿
Á T yt − ŷt (t − 1) 2 Á T yt − (ℓt−1 + bt−1 ) snT −L 2
Á∑[ ] Á∑[ ]
Á Á
Á
À t=1 ŷt (t − 1) Á
À t=1 (ℓt−1 + bt−1 ) snt−L
sr = =
T −3 T −3
The error correction form for the smoothing equations in the additive Holt-Winters method is made of:
yT − (ℓT −1 + bT −1 ) snT −L
ℓT = ℓT −1 + bT −1 + α [ ]
snT −L
yT − (ℓT −1 + bT −1 ) snT −L
bT = bT −1 + αγ [ ]
(ℓT −1 + bT −1 ) snT −L
yT − (ℓT −1 + bT −1 ) snT −L
snT = snT −1 + (1 − α) δ [ ]
(ℓT −1 + bT −1 ) snT −L
ACTIVITY 5.17
The following data for quarterly sales in 1000s of the cases of Tiger Sports Drink for eight consecutive years can be found on
page 377 in the prescribed textbook. The investigator wanted to estimate the point forecasts and the 95% prediction interval of the
number of cases to be sold in the four quarters of the 9th year.
Year
Quarter 1 2 3 4 5 6 7 8
1 72 77 81 87 94 102 106 115
2 116 123 131 140 147 162 170 177
3 136 146 158 167 177 191 200 218
4 96 101 109 120 128 134 142 149
131
(a) Plot sales versus time where here the time variable varies from 1 to 32.
(b) Explain why the multiplicative Holt-Winters method is appropriate for the data.
(c) Calculate the estimates of the smoothing levels, the growth rates, the seasonal factors, the forecasts made last periods, the
forecast errors, and the squared forecast errors. Use α = 0.2, γ = 0.1 and δ = 0.1 as smoothing constants.
(d) The optimal values of the smoothing constants that minimizes SSE were found to be α = 0.336, γ = 0.046 and δ = 0.134.
Repeat the questions in part (c) using these new constants.
(e) Calculate the point forecasts and the 95% prediction interval of sales at the 4 quarters of year 9.
(a) The plot of sales versus time is the following:
(b) The graph indicates a linearly increasing trend and the seasonal pattern increases with time. Thus a multiplicative Holt-
Winters may be appropriate for analysing the data.
(c) Fitting the simple linear regression model yt = β0 + β1 t + ϵt to the first 16 observations gives the following output.
SUMMARY OUTPUT
Multiple R 0.4038
R Square 0.1631
Observations 16
132
ANOVA
Regression 1 2075.2941 2075.2941 2.7276 0.1209
Residual 14 10651.7059 760.8361
Total 15 12727

Intercept 95.2500 14.4648 6.5850 0.0000 64.2261 126.2739
Time 1 2.4706 1.4959 1.6516 0.1209 -0.7378 5.6790
Hence, the fitted model is: ŷt = 95.25 + 2.4706t. The initial smoothing level and the initial growth rate can thus be chosen as
l0 = 95.25 and b0 = 2.4706.
The initial seasonal factors can be found using the following four steps:
Step 1: Use the fitted model to calculate the regression estimates for the first 16 observations. The regression estimates are
reported in the fifth column of the following table.
Y Q T Sales Reg Est Detr Quart 1 Quart 2 Quart 3 Quart 4 Index

1 1 1 72 97.6706 0.7372 0.7372 1.1584 1.3254 0.9136 0.7062
1 2 2 116 100.1412 1.1584 0.7159 1.1179 1.2978 0.8785 1.1115
1 3 3 136 102.6118 1.3254 0.6897 1.0925 1.2911 0.8731 1.2937
1 4 4 96 105.0824 0.9136 0.6833 1.0787 1.2627 0.890 0.8886
2 1 5 77 107.553 0.7159
2 2 6 123 110.0236 1.1179
2 3 7 146 112.4942 1.2978
2 4 8 101 114.9648 0.8785
3 1 9 81 117.4354 0.6897
3 2 10 131 119.906 1.0925
3 3 11 158 122.3766 1.2911
3 4 12 109 124.8472 0.8731
4 1 13 87 127.3178 0.6833
4 2 14 140 129.7884 1.0787
4 3 15 167 132.259 1.2627
4 4 16 120 134.7296 0.8907
0.7065 1.1119 1.2942 0.8890 4
Step 2: We detrend the data by calculating the ratios St = yt /ŷt .

For example, S1 = y1 /ŷ1 = 72/97.7206 = 0.7368. The other values are calculated in the similar way. The results are reported
in the sixth column of the table.
Step 3: We calculate the average per season (here quarter) of the detrended values. That is: For quarter 1, the calculations
are the following:
(y1 /ŷ1 )+(y5 /ŷ5 )+(y9 /ŷ9 )+(y13 /ŷ13 )

S̄1 = 4
S1 +S5 +S9 +S13
= 4
0.7372+0.7159+0.6897+0.6833
= 4
= 0.7065.
133
Then, S̄2 , S̄3 and S̄4 are calculated in a similar way. The results are reported on the bottom of the table in the columns
corresponding to the quarters.
Step 3: The averages S̄i , i = 1, . . . , 4 are not likely to sum to L = 4. Therefore, we multiply each average by the following
correction factor:
L
CF = L
.
∑i=1 S̄i
In the present case, the sum of the average is:
0.7065 + 1.1119 + 1.2942 + 0.8890 = 4.0016.
Therefore, the initial seasonal factors (indices) are:
0.7065×4
S1−L = S1−4 = S−3 = 4.0016
= 0.7062
1.1119×4
S2−L = S2−4 = S−2 = 4.0016 = 1.1115
S3−L = S3−4 = S−1 = 1.2942×4
4.0016
= 1.2937
0.8890×4
S4−L = S4−4 = S0 = 4.0016 = 0.8886
These results are reported in the last column of the table. As expected they sum to 4. Now, we are ready to estimate the
components of the multiplicative Holt-Winters model.
The point forecast at t = 1, obtained using ŷT +τ = (lT + τ bT )snT +τ −L , (τ = 1, 2, . . .) is:
ŷ1 (0) = (l0 + 1b0 )sn0+1−4 = (l0 + b0 )sn−3 = (95.25 + 2.4706)(0.7062) = 69.0103.
The smoothing level at t = 1, obtained using

yT
ℓT = α ( ) + (1 − α) (ℓT −1 + bT −1 ),
snT −L
is:
y1
l1 = α( ) + (1 − α) (ℓ1−1 + b1−1 )
sn1−4
y1
= 0.2 ( ) + (1 − α) (ℓ0 + b0 )
sn−3
72
= 0.2 ( ) + (0.8) (95.25 + 2.4706)
0.7062
= 98.5673.
The growth rate at t = 1, obtained using

bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 ),
134
is:
b1 = γ (ℓ1 − ℓ1−1 ) + (1 − γ) (b1−1 )

= γ (ℓ1 − ℓ0 ) + (1 − γ) (b0 )
= 0.1 (98.5673 − 95.25) + (0.9) (2.4706)
= 2.5553
The seasonal factor at time t = 1, obtained using

yT
snT = δ ( ) + (1 − δ) snT −L ,
ℓT
is:
y1
sn1 = δ( ) + (1 − δ) sn1−4
ℓ1
y1
= δ ( ) + (1 − δ) sn−3
ℓ1
72
= 0.1 ( ) + (0.9) 0.7062
98.5673
= 0.7086.
The point forecast at t = 2, obtained using ŷT +τ = (lT + τ bT )snT +τ −L , (τ = 1, 2, . . .) is:
ŷ2 (1) = (l1 + b1 )sn1+2−4 = (l1 + b1 )sn−2 = (98.5673 + 2.5553)(1.1115) = 112.3978.
y2
l2 = α( ) + (1 − α) (ℓ2−1 + b2−1 )
sn2−4
y2
= 0.2 ( ) + (1 − α) (ℓ1 + b1 )
sn−2
116
= 0.2 ( ) + (0.8) (98.5673 + 2.5553)
1.1115
= 101.7708.
b2 = γ (ℓ2 − ℓ2−1 ) + (1 − γ) (b2−1 )

= γ (ℓ2 − ℓ1 ) + (1 − γ) (b1 )
= 0.1 (101.7708 − 98.5673) + (0.9) (2.5553)
= 2.6201.
135
y2
sn2 = δ( ) + (1 − δ) sn2−4
ℓ2
y2
= δ ( ) + (1 − δ) sn−2
ℓ2
116
= 0.1 ( ) + (0.9) 1.1115
101.7708
= 1.1143.
Y Q Time (t) Sales (yt ) lt bt snt ŷt (t − 1) et e2t

- - -3 - - - 0.7062 - - -
- - -2 - - - 1.1115 - - -
- - -1 - - - 1.2937 - - -
0 95.25 2.4706 0.8886 - - -
1 1 1 72 98.5673 2.5553 0.7086 69.0103 2.9897 8.9384
1 2 2 116 101.7708 2.6201 1.1143 112.3977 3.6023 12.9763
1 3 3 136 104.5376 2.6348 1.2944 135.0504 0.9496 0.9017
1 4 4 96 107.3449 2.6520 0.8892 95.2334 0.7666 0.5877
2 1 5 77 109.7298 2.6253 0.7079 77.9468 -0.9468 0.8964
2 2 6 123 111.9601 2.5858 1.1128 125.2008 -2.2008 4.8435
2 3 7 146 114.1949 2.5507 1.2928 148.2712 -2.2712 5.1584
2 4 8 101 116.1143 2.4876 0.8872 103.8069 -2.8069 7.8786
3 1 9 81 117.7649 2.4039 0.7059 83.9626 -2.9626 8.7768
3 2 10 131 119.6801 2.3550 1.1109 133.7189 -2.7189 7.3925
3 3 11 158 122.0705 2.3585 1.2930 157.7713 0.2287 0.0523
3 4 12 109 124.1139 2.3270 0.8863 110.3981 -1.3981 1.9547
4 1 13 87 125.8013 2.2631 0.7045 89.2576 -2.2576 5.0970
4 2 14 140 127.6553 2.2222 1.1095 142.2720 -2.2720 5.1621
4 3 15 167 129.7337 2.2078 1.2924 167.9297 -0.9297 0.8644
4 4 16 120 132.6309 2.2767 0.8882 116.9445 3.0555 9.3360
5 1 17 94 134.6122 2.2472 0.7039 95.0408 -1.0408 1.0833
5 2 18 147 135.9855 2.1598 1.1067 151.8479 -4.8479 23.5021
5 3 19 177 137.9069 2.1360 1.2915 178.5406 -1.5406 2.3736
5 4 20 128 140.8573 2.2174 0.8902 124.3831 3.6169 13.0817
6 1 21 102 143.4424 2.2542 0.7046 100.7059 1.2941 1.6747
6 2 22 162 145.8344 2.2680 1.1071 161.2374 0.7626 0.5816
6 3 23 191 148.0594 2.2637 1.2914 191.2769 -0.2769 0.0767
6 4 24 134 150.3630 2.2676 0.8903 133.8227 0.1773 0.0314
7 1 25 106 152.1928 2.2239 0.7038 107.5422 -1.5422 2.3783
7 2 26 170 154.2447 2.2067 1.1066 170.9523 -0.9523 0.9069
7 3 27 200 156.1360 2.1751 1.2903 202.0364 -2.0364 4.1469
7 4 28 142 158.5472 2.1987 0.8909 140.9488 1.0512 1.1051
8 1 29 115 161.2774 2.2519 0.7047 113.1299 1.8701 3.4973
8 2 30 177 162.8136 2.1803 1.1046 180.9599 -3.9599 15.6812
8 3 31 218 165.7851 2.2594 1.2928 212.8959 5.1041 26.0522
8 4 32 149 167.8865 2.2436 0.8905 149.7038 -0.7038 0.4954
136
(d) The results found in part (c) served for illustrating the computation, but are not useful for prediction since the optimal values
of the smoothing constants that minimises SSE were not used. We now use the optimal values α = 0.336, γ = 0.046. and
δ = 0.134.
The regression estimates, the detrended values and the initial seasonal factors remain as in part (c). However, the smoothing
levels, the growth rates, the seasonal factors and the point forecast from time t = 1 to t = 32 will change since the smoothing
constants have changed. We obtain the following:
ŷ1 (0) = (l0 + 1b0 )sn0+1−4 = (l0 + b0 )sn−3 = (95.25 + 2.4706)(0.7062) = 69.0103.
y1
l1 = α( ) + (1 − α) (ℓ1−1 + b1−1 )
sn1−4
y1
=( ) + (1 − α) (ℓ0 + b0 )
sn−3
72
= 0.336 ( ) + (0.664) (95.25 + 2.4706)
0.7062
= 99.1431.
b1 = γ (ℓ1 − ℓ1−1 ) + (1 − γ) (b1−1 )

= γ (ℓ1 − ℓ0 ) + (1 − γ) (b0 )
= 0.046 (99.1431 − 95.25) + (0.954) (2.4706)
= 2.5361.
y1
sn1 = δ( ) + (1 − δ) sn1−4
ℓ1
y1
= δ ( ) + (1 − δ) sn−3
ℓ1
72
= 0.134 ( ) + (0.866) 0.7062
99.1431
= 0.7089.
The process continues for t = 2, 4, . . . , 16. The results are presented in the following table.
137
Y Q T yt lt bt snt ŷt (t − 1) et e2t srt2
- - -3 - - - 0.7062 - - - -
- - -2 - - - 1.1115 - - - -
- - -1 - - - 1.2937 - - - -
- - 0 - 95.25 2.4706 0.8886 - - - -
1 1 1 72 99.1431 2.5360 0.7089 69.0103 2.9897 8.9384 0.00187686
1 2 2 116 102.5810 2.5775 1.1141 113.0163 2.9837 8.9024 0.00069699
1 3 3 136 105.1472 2.5770 1.2937 136.0436 -0.0436 0.0019 0.00000010
1 4 4 96 107.8287 2.5818 0.8888 95.7238 0.2762 0.0763 0.00000833
2 1 5 77 109.8094 2.5542 0.7079 78.2681 -1.2681 1.6082 0.00026252
2 2 6 123 111.7052 2.5239 1.1123 125.1829 -2.1829 4.7651 0.00030407
2 3 7 146 113.7684 2.5027 1.2923 147.7740 -1.7740 3.1470 0.00014411
2 4 8 101 115.3846 2.4619 0.8870 103.3449 -2.3449 5.4988 0.00051486
3 1 9 81 116.6986 2.4091 0.7060 83.4183 -2.4183 5.8481 0.00084042
3 2 10 131 118.6578 2.3884 1.1112 132.4893 -1.4893 2.2181 0.00012636
3 3 11 158 121.4557 2.4072 1.2934 156.4251 1.5749 2.4804 0.00010137
3 4 12 109 123.5338 2.3921 0.8864 109.8689 -0.8689 0.7549 0.00006254
4 1 13 87 125.0192 2.3504 0.7047 88.9052 -1.9052 3.6297 0.00045922
4 2 14 140 126.9048 2.3290 1.1102 141.5372 -1.5372 2.3631 0.00011796
4 3 15 167 129.1936 2.3272 1.2933 167.1548 -0.1548 0.0240 0.00000086
4 4 16 120 132.8175 2.3868 0.8887 116.5792 3.4208 11.7019 0.00086102
5 1 17 94 134.5975 2.3589 0.7038 95.2725 -1.2725 1.6192 0.00017839
5 2 18 147 135.4302 2.2887 1.1068 152.0428 -5.0428 25.4298 0.00110005
5 3 19 177 137.4292 2.2754 1.2926 178.1149 -1.1149 1.2431 0.00003918
5 4 20 128 141.1589 2.3423 0.8911 124.1534 3.8466 14.7962 0.00095991
6 1 21 102 143.9794 2.3643 0.7044 100.9982 1.0018 1.0035 0.00009838
6 2 22 162 146.3500 2.3646 1.1069 161.9793 0.0207 0.0004 0.00000002
6 3 23 191 148.3952 2.3499 1.2919 192.2285 -1.2285 1.5093 0.00004085
6 4 24 134 150.6205 2.3441 0.8909 134.3304 -0.3304 0.1092 0.00000605
7 1 25 106 152.1282 2.3057 0.7034 107.7534 -1.7534 3.0745 0.00026479
7 2 26 170 154.1498 2.2926 1.1063 170.9358 -0.9358 0.8757 0.00002997
7 3 27 200 155.8956 2.2674 1.2907 202.1024 -2.1024 4.4200 0.00010821
7 4 28 142 158.5742 2.2864 0.8915 140.9098 1.0902 1.1885 0.00005986
8 1 29 115 161.7440 2.3270 0.7044 113.1506 1.8494 3.4201 0.00026713
8 2 30 177 162.7000 2.2639 1.1038 181.5140 -4.5140 20.3760 0.00061844
8 3 31 218 166.2882 2.3248 1.2934 212.9131 5.0869 25.8769 0.00057083
8 4 32 149 168.1144 2.3019 0.8908 150.3230 -1.3230 1.7504 0.00007746
- - - - - - - - - - 0.01079712
In the above table, srt2 is the squared relative error:
2
yt − ŷt (t − 1)
srt2 =[ ] .
ŷt (t − 1)
138
(e) The point forecasts in time 32 of y33 , y34 , y35 and y36 are the following:
ŷ33 (32) = (l32 + b32 )sn33−4

= (l32 + b32 )sn29
= (168.1144 + 2.3019)(0.7044)
= 120.0412.
ŷ34 (32) = (l32 + 2b32 )sn34−4

= (l32 + 2b32 )sn30
= (168.1144 + 2(2.3019))(1.1038)
= 190.6463.
ŷ35 (32) = (l32 + 3b32 )sn35−4

= (l32 + 3b32 )sn31
= (168.1144 + 3(2.3019))(1.2934)
= 226.3710.
ŷ36 (32) = (l32 + 4b32 )sn36−4

= (l32 + 4b32 )sn32
= (168.1144 + 4(2.3019))(0.8908)
= 157.9584.
To calculate the 95% prediction intervals, we first compute the relative standard error as follows:
¿
Á T y − ŷ (t − 1) 2
Á
Á∑[ t t
]
Á
À t=1 ŷt (1 − 1)
sr = T −3
¿
Á 32 y − ŷ (t − 1) 2
Á
Á∑[ t t
]
Á
À t=1 ŷt (1 − 1)
= 32−3
√
0.01079712
= 29
= 0.0193.
The 95% prediction interval for y33 is:
√ √
ŷ33 (32) ± z0.025 sr c1 sn33−L = ŷ33 (32) ± z0.025 sr (l32 + b32 )2 sn29
√
= 120.0412 ± 1.96(0.0193) (168.1144 + 2.3019)2 (0.7044)
= 120.0412 ± 1.96(0.0193)(168.1144 + 2.3019)(0.7044)
= 120.0412 ± 4.5409
= [115.5003; 124.5821].
139
√ √
ŷ34 (32) ± z0.025 sr c2 sn34−L = ŷ34 (32) ± z0.025 sr c2 sn30
where
c2 = α2 (1 + γ)2 (lT + bT )2 + (lT + 2bT )2

= α2 (1 + γ)2 (l32 + b32 )2 + (l32 + 2b32 )2
= (0.336)2 (1 + 0.046)2 (168.1144 + 2.3019)2 + (168.1144 + 2(2.3019))2
= 33418.8476.
Hence, the 95% prediction interval of y34 is:
√ √
√
= 190.6463 ± 1.96(0.0193) 33418.8476(1.1038)
= 190.6463 ± 7.6331
= [183.0132; 198.2794].
√ √
where
c3 = α2 (1 + 2γ)2 (l32 + b32 )2 + α2 (1 + γ)2 (l32 + 2b32 )2 + (l32 + 3b32 )2
= (0.336)2 (1 + 2(0.046))2 (168.1144 + 2.3019)2
+ (0.336)2 (1 + 0.046)2 (168.1144 + 2(2.3019))2 + (168.1144 + 3(2.3019))2
= 38226.5951.
√ √
√
= 226.3710 ± 1.96(0.0193) 38226.5951(1.2934)
= 226.3710 ± 9.5660
= [216.8050; 235.9370].
√ √
140
where
c4 = ∑4−1 2 2 2
j=1 α (1 + [4 − j]γ) (lT + jbT ) + (lT + 4bT )
2
= α2 (1 + 3γ)2 (l32 + b32 )2 + α2 (1 + 2γ)2 (l32 + 2b32 )2

+ α2 (1 + γ)2 (l32 + 3b32 )2 + (l32 + 4b32 )2
= (0.336)2 (1 + 3(0.046))2 (168.1144 + 2.3019)2
+ (0.336)2 (1 + 2(0.046))2 (168.1144 + 2(2.3019))2
+ (0.336)2 (1 + 0.046)2 (168.1144 + 3(2.3019))2 + (168.1144 + 4(2.3019))2
= 43488.912.
√ √
ŷ36 (32) ± z0.025 sr c4 sn36−4 = ŷ36 (32) ± z0.025 sr c4 sn32
√
= 157.9584 ± 1.96(0.0193) 43488.912(0.8908)
= 157.9584 ± 7.0272
= [150.9312; 164.9856].
5.6 Damped trend exponential smoothing

It is possible for a time series to have a growth rate that will not be sustained into the future and whose effects need to be damp-
ened. The damped trend method, called the Gardner and McKenzie’s damped trend exponential smoothing, may be used for this
dampening. Dampening the growth means to reduce it in size so that the rate of increase or decrease for the forecasts is slowing
down. In the following sections we present various cases including: (1) the damped trend with no seasonal pattern, and (2) the
Holt-Winters methods with damped trend.
5.6.1 Damped trend method

Consider a time series y1 , y2 , . . . , yn that exhibits a linear trend for which the level and growth rate are changing over time but with
no seasonal pattern. The estimates for the smoothing level and growth rate are given by the equations
lT = αyT + (1 − α)(lT −1 + ϕbT −1 )

bT = γ(lT − lT −1 ) + (1 − γ)ϕbT −1
where α and γ are smoothing constants between 1 and 1, and ϕ is a damping factor between 0 and 1.
ŷT +τ (T ) = lT + (ϕ + ϕ2 + ⋯ + ϕτ )bT .
ACTIVITY 5.18
Which values of the damping factor ϕ are associable with:
(a) Meager (or weak) dampening?
(b) Substantial dampening?
141
We know that the values of the dampening factor lies between 0 and 1. The values near 0 have less dampening effect than the
ones near 1. Hence:
(a) Meager (or weak) dampening will be effected with values near 0.
(b) Substantial dampening will be effected with values near 1.

One may need to know why the value 0 and 1 are excluded as possible.
ACTIVITY 5.19
What will happen if the dampening factor can be equated:
(a) to 0?
(b) to 1?
(a) When ϕ = 0, then lT = αyT + (1 − α)lt−1 which is the estimate of the smoothing level at time T for the simple exponential
smoothing.
(b) When ϕ = 1, then lT = αyT + (1 − α)(lt−1 + bT −1 ) and bt = γ(lT − lT −1 ) + (1 − γ)bT −1 which are the estimates of the
smoothing level and the growth rate at time T for the Holt’s trend corrected exponential smoothing.
Once the point forecast is determined, one can also determine the interval prediction.
If τ = 1, then a 95% prediction interval computed at time T for yT +1 is:
ŷT +1 (T ) ± z0.025 s
where
¿
ÁT
√ Á ∑ [yt − (ℓT −1 + ϕbT −1 )]2
SSE Á Á
À t=1
s= = .
T −2 T −2
√
ŷT +2 (T ) ± z0.025 s 1 + α2 (1 + ϕγ)2
√
ŷT +3 (T ) ± z0.025 s 1 + α2 (1 + ϕγ)2 + α2 (1 + ϕγ + ϕ2 γ)2
142
If τ ≥ 4, then a 95% prediction interval computed at time T for yT +τ is:
¿
Á τ −1
À1 + α2 (1 + ∑ α2 (1 + ϕj γ)2
ŷT +τ (T ) ± z0.025 sÁ
j=1
where ϕj = ϕ + ϕ2 + ⋯ + ϕj .
ACTIVITY 5.20
Show that the error correction forms of the damped trend exponential smoothing equations are:
lT = lt−1 + ϕbT −1 + α[yT − (lT −1 + ϕbT −1 )]

bT = ϕbT −1 + γ[yT − (lT −1 + ϕbT −1 )].
lT = αyT + (1 − α)(lT −1 + ϕbT −1 )

= αyT + lT −1 + ϕbT −1 − αlT −1 − αϕbT −1
= lT −1 + ϕbT −1 + α[yT − (lT −1 + ϕbT −1 )]
and
bT = γ(lT − lT −1 ) + (1 − γ)ϕbT −1
= γlT − γlT −1 + ϕbT −1 − γϕbT −1
= ϕbT −1 + γ(lT − lT −1 − ϕbT −1 )
= ϕbT −1 + γ[lT − (lT −1 + ϕbT −1 )].
5.6.2 Additive Holt-Winters with damped trend
Remember that the additive Holt-Winter method is appropriate for time series with fixed linear trend and fixed growth rate
and constant seasonal variation. The results in Section 5.5.1 imply that estimates of the smoothing level, the growth rate and
the seasonal component for the additive Holt-Winters with damped trend are the following:
ℓT = α (yT − snT −L ) + (1 − α) (ℓT −1 + ϕbT −1 )
bT = γ (ℓT − ℓT −1 ) + (1 − γ) ϕ (bT −1 )
snT = δ (yT − ℓT ) + (1 − δ) snT −L
t̂T +τ (T ) = lT + (ϕbT + ϕ2 bT + ⋯ + ϕτ bT ) + snT +τ −L .
A 95% prediction interval computed in time period T for yT +τ is:
143
√
ŷT +τ (T ) ± z0.025 s cτ
where s is as in for the additive Holt-Winters method, cτ = 1 for τ = 1,

−1
cτ = 1+ ∑τj=1 [α(1+ϕj γ)+dj,L (1−α)δ]2 with dj,L = 1 if j in an integer multiple of L and 0 otherwise, and ϕj = ϕ+ϕ2 +⋯ϕj
for τ ≥ 2.
ACTIVITY 5.21
Show that the error correction form equations of the additive Holt-Winters with damped trend exponential smoothing are:
lT = lT −1 + ϕbT −1 + α[yT − (lT −1 + ϕbT −1 + snT −L )]

bT = ϕbT −1 + αγ[yT − (lT −1 + ϕbT −1 + snT −L )]
snT = snT −L + (1 − α)δ[yT − (lT −1 + ϕbT −1 + snT −L )].
lT = α(yT − snT −L ) + (1 − α)(lT −1 + ϕbT −1 )

= αyT − αsnT −L + lT −1 + ϕbT −1 − αlT −1 − αϕbT −1
= lT −1 + ϕbT −1 + α[yT − (lT −1 + ϕbT −1 + snT −l )].
The other two equations are derived in a similar manner. Try to establish them!
5.6.3 Multiplicative Holt-Winters with damped trend
Remember that the multiplicative Holt-Winter method is appropriate for time series with fixed linear trend and fixed growth
rate and a changing (increasing) seasonal variation. The results in Section 5.5.2 imply that estimates of the smoothing level,
the growth rate and the seasonal component for the multiplicative Holt-Winters with damped trend are the following:
yT
ℓT = α ( ) + (1 − α) (ℓT −1 + ϕbT −1 )
snT −L
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (ϕbT −1 )
yT
snT = δ ( ) + (1 − δ) snT −L
ℓT
t̂T +τ (T ) = lT + (ϕbT + ϕ2 bT + ⋯ + ϕτ bT ) + snT +τ −L .
A 95% prediction interval computed in time period T for yT +τ is:
√
ŷT +τ (T ) ± z0.025 sr cτ snT +τL
144
where s is as in for the multiplicative Holt-Winters method.
If τ = 1, then c1 = (lT + ϕbT )2 .
If 2 ≤ τ ≤ L, then
τ −1
cr = ∑ α2 (1 + [τ − j]γ)2 (lT + ϕj bT )2 + (lT + ϕτ bT )2
j=1
2 j
where ϕj = ϕ + ϕ + ⋯ + ϕ .
ACTIVITY 5.22
Show that the error correction form equations of the multiplicative Holt-Winters with damped trend exponential smoothing
are:
lT = lT −1 + ϕbT −1 + α [yT −(lT −1sn

+ϕbT −1 snT −L )]
T −L
bT = ϕbT −1 + αγ [yT −(lT −1 +ϕb T −1 +snT −L )]

snT −L
snT = snT −L + (1 − α)δ [yT −(lT −1 +ϕb
lT
T −1 +snT −L )]
.
lT = lT −1 + ϕbT −1 + α [yT −(lT −1sn

+ϕbT −1 snT −L )]
T −L
= lT −1 + ϕbT −1 − α ( snyTT−L − (lt−1 + ϕbT −1 ))

α[yT −(lt−1 +ϕbT −1 )snT −L ]
= lT −1 + ϕbT −1 − snT −L
.
The other two equations are derived in a similar manner. Try to establish them!
ACTIVITY 5.23
Show that the no trend multiplicative Holt-Winters method is characterised by the following results:
(1) The estimates of the smoothing levels and seasonal components are:
lT = α(yT /snT −L ) + (1 − α)lT1
and
snT = δ(yT /lT ) + (1 − α)snT −L
(2) A point forecast in time period T for yT +τ is:
ŷT +τ (T ) = lT snT +τ −L .
(3) An approximate 95% prediction interval computed in time period T for yT +τ when 1 ≤ τ ≤ L is:
√
ŶT +τ (T ) ± z0.025 sr ( 1 + (τ − 1)α2 ) lT snT +τ −L .
145
If there is no trend, then bT = 0 for all values of T . It follows from the results in Section (5.6.3) that:
yT
lT = α( ) + (1 − α) (ℓT −1 + ϕbT −1 )
snT −L
yT
= α( ) + (1 − α) (ℓT −1 + ϕ(0))
snT −L
yT
= α( ) + (1 − α) lT −1 .
snT −L
The other results can be derived in s similar manner. Try to establish them!
5.7 Conclusion
This unit discussed various forecasting models that are based on exponential smoothing. Simple exponential smoothing was found
to be appropriate for time series with no trend and no seasonal variation. When a time series exhibits a linear trend at least
locally with seasonal variation, the Holt’s trend corrected exponential smoothing method was found to be appropriate. Time series
exhibiting linear trend, and constant seasonal variation were found to be better analysed using the additive Holt-Winters method
while time series that have a linear trend with changing seasonal variation were found to be better analysed using the multiplicative
Holt-Winters method. Finally the damped trend exponential smoothing method was found to be appropriate for time series which
have a growth rate that is not sustained in future and thus the growth rate has to be multiplied by a damping factor.
146

Sta2604 Study Guide

Uploaded by

Copyright:

Available Formats

You might also like

Sta2604 Study Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sta2604 Study Guide

Uploaded by

Copyright:

Available Formats

Department of Statistics

Lecture Notes (Study Guide)

2 Model Building and Residual Analysis 25

3 Time series regression 49

4 Decomposition of a time series 89

5 Exponential smoothing 102

The module is done in one year.

About the book

The computer and the calculator

This study guide

Module position in the curriculum

DISCUSSION OF ACTIVITY 0.1

What is the meaning of the word data?

DISCUSSION OF ACTIVITY 0.2

What to expect in the module

- Simple linear regression;

We will highlight these topics when we encounter them.

● Build a regression model and perform residual analysis.

● Conduct multiplicative and additive decompositions of a time series.

● Apply exponential smoothing methods.

- develop a model - form an - regression - small build-up - emphasise

- estimate parameters - perform - estimation - perform - discuss

- validate a model - statistical - hypothesis - test hypotheses - peruse the

- develop forecasts - demonstrate - model - form equations - visit various

Difficulties in forecasting terminology

Defining a useful forecast

Forecasts create the future

• Identify and compare different forecasting models.

• Apply forecasting methods correctly.

• Detect errors in forecasting.

- decompose - graph - time series - plot - critique

- calculate - stepwise - errors - discuss each

Forecasting has applications in many situations:

● Weather forecasting, Flood forecasting, and Meteorology

● Transport planning and Transport forecasting

● Land use forecasting

● Player and team performance in sports

DISCUSSION OF ACTIVITY 1.1

FURTHER DISCUSSION ON FORECASTING

(a) Provide a simple example of a situation where forecasting is needed.

DISCUSSION OF THE BOXING SCENARIO

Data set 1.1 16 14 19 26 11 24 10

Data set 1.2 16 18 21 24

DISCUSSION OF ACTIVITY 1.4

Data set 1.3 Litres of milk sold by Jabulani

(a) Are they time series data? Justify your answer.

(i) Plot the data for each week separately.

(c) Which plot provides us with a better idea of comparison?

DISCUSSION OF ACTIVITY 1.5

(i) Graphs for separate weeks

(ii) Graph for data of all the weeks

1.1.3 Components of a time series

- Technological changes in the industry

- Changes in consumer tastes

- Increases in total population

- Inflation or deflation (price changes)

DISCUSSION OF ACTIVITY 1.6

Exploration data set