Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 144

Forecasting using

simple models
Outline
 Basic forecasting models
– The basic ideas behind each model
– When each model may be appropriate
– Illustrate with examples
 Forecast error measures
 Automatic model selection
 Adaptive smoothing methods
– (automatic alpha adaptation)
 Ideas in model based forecasting techniques
– Regression
– Autocorrelation
– Prediction intervals

2
Basic Forecasting Models

 Moving average and weighted moving


average
 First order exponential smoothing
 Second order exponential smoothing
 First order exponential smoothing with
trends and/or seasonal patterns
 Croston’s method

3
M-Period Moving Average
m 1
 Vt  j
Pt 1(t)  j 0
M
 i.e. the average of the last M data points
 Basically assumes a stable (trend free) series
 How should we choose M?
– Advantages of large M?
– Advantages of large M?
 Average age of data = M/2

4
Weighted Moving Averages

n
Pt 1(t)  W V
t j t j
j 0

 The Wi are weights attached to each


historical data point
 Essentially all known (univariate) forecasting
schemes are weighted moving averages
 Thus, don’t screw around with the general
versions unless you are an expert

5
Simple Exponential
Smoothing

 Pt+1(t) = Forecast for time t+1 made at time t


 Vt = Actual outcome at time t
 0<<1 is the “smoothing parameter”

6
Two Views of Same Equation

 Pt+1(t) = Pt(t-1) + [Vt – Pt(t-1)]

– Adjust forecast based on last forecast


error
OR

 Pt+1(t) = (1- )Pt(t-1) + Vt

– Weighted average of last forecast and last


Actual

7
Simple Exponential
Smoothing
 Is appropriate when the underlying time
series behaves like a constant + Noise
– Xt =  + N t
– Or when the mean  is wandering around
– That is, for a quite stable process
 Not appropriate when trends or seasonality
present

8
ES would work well here

Typical Behavior for Exponential Smoothing

4
2

0
-2
Demand

-4
-6

-8
-10

-12
-14
1

13
19

25

31

37
43

49

55

61
67

73

79

85

91

97

103

109
115
Period

9
Simple Exponential
Smoothing
 We can show by recursive substitution
that ES can also be written as:

 Pt+1(t) = Vt + (1-)Vt-1 + (1-)2Vt-2 + (1-)3Vt-3 +…..

 Is a weighted average of past


observations
 Weights decay geometrically as we go
backwards in time

10
Weights on past data

0.7

0.6

0.5

0.4
Expo Smooth a=0.6
MoveAve(M=5)
0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10

11
Simple Exponential
Smoothing
 Ft+1(t) = At + (1-)At-1 + (1-)2At-2 + (1-)3At-3 +…..

 Large  adjusts more quickly to changes


 Smaller  provides more “averaging” and
thus lower variance when things are stable
 Exponential smoothing is intuitively more
appealing than moving averages

12
Exponential Smoothing
Examples

13
Zero Mean White Noise

Series

0 Series

-1

-2

-3
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
14
15
-3
-2
-1
0
1
2
3
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
0.1
Series
16
-3
-2
-1
0
1
2
3
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
0.3
0.1
Series
Shifting Mean + Zero Mean White Noise

Series
0
Mean

-1

-2

-3
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
-4
17
18
-4
-3
-2
-1
0
1
2
3
4
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
0.1
Mean
Series
19
-4
-3
-2
-1
0
1
2
3
4
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
0.3
Mean
Series
Automatic selection of 

 Using historical data


 Apply a range of  values
 For each, calculate the error in one-step-
ahead forecasts
– e.g. the root mean squared error (RMSE)
 Select the  that minimizes RMSE

20
RMSE vs Alpha
1.45

1.4

1.35
RMSE

1.3

1.25

1.2

1.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha

21
Recommended Alpha

 Typically alpha should be in the range 0.05 to


0.3
 If RMSE analysis indicates larger alpha,
exponential smoothing may not be
appropriate

22
23
Time Series Value

-2
-1
0
1
2

-1.5
-0.5
0.5
1.5
1
6

11
16

21

26
31

36
41

46

51
Original Data

Period
56

61
66

71

76
81

86
91

96
Actual vs Forecast for
Various Alpha

1.5

0.5 Demand
Forecast

a=0.1
0
a=0.3
-0.5 a=0.9

-1

-1.5

-2
1

7
13
19

25
31

37
43
49

55
61

67
73
79

85
91

97
Period

24
Series and Forecast using Alpha=0.9

2 Might look good, but is it?


1.5

0.5
Forecast

-0.5

-1

-1.5

-2
1
6
11

16
21
26

31
36
41

46
51
56

61
66
71

76
81
86

91
96
101
Period

25
Series and Forecast using Alpha=0.9

1.5

0.5
Forecast

-0.5

-1

-1.5

-2
1
6
11

16
21
26

31
36
41

46
51
56

61
66
71

76
81
86

91
96
101
Period

26
Series and Forecast using Alpha=0.9

1.5

1
Forecast

0.5

-0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Period

27
Series and Forecast using Alpha=0.9

1.5

1
Forecast

0.5

-0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Period

28
Forecast RMSE vs Alpha

0.67

0.66

0.65

0.64
Forecast RMSE

0.63

0.62 Series1

0.61

0.6

0.59

0.58

0.57
0 0.2 0.4 0.6 0.8 1
Alpha

29
Exponential Smoothing on Lake Huron Level Data
Various Alphas

13

12

11
Forecast and actual

10 HuronLevel
a=0.1
9
a=0.3
8 a=0.9

5
1
7

13
19
25
31

37
43
49
55

61
67
73
79

85
91
97
Period

30
Forecast Errors for Lake Huron Data
Various Alphas

1
Forecast Error

a=0.1
0
a=0.9

-1

-2

-3
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
Period

31
Forecast RMSE vs Alpha
for Lake Huron Data

1.1
1.05
1
0.95
0.9
RMSE

0.85
0.8
0.75
0.7
0.65
0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha

32
Monthly Furniture Demand vs Forecast
Various Alphas

160

140

120

100 Demand
Orders

a=0.1
80
a=0.3
60 a=0.9

40

20

0
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
Period

33
Monthly Furniture Demand Forecast Errors
Various Alphas

80

60

40

20
Forecast Error

a=0.1
0
a=0.3
-20
a=0.9
-40

-60

-80

-100
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
Period

34
Forecast RMSE vs Alpha
for Monthly Furniture Demand Data

45.6

40.6

35.6

30.6
RMSE

25.6

20.6

15.6

10.6

5.6

0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Alpha

35
Exponential smoothing will
lag behind a trend
 Suppose Xt=b0+ b1t
 And St= (1- )St-1 + Xt
 Can show that

E St  E X t 








(1


 ) b0
 


   
   

36
Exponential Smoothing on a Trend

12

10

8
Trend Data
6 0.2
0.5
4

0
1 2 3 4 5 6 7 8 9 10 11 12
Period

37
Double Exponential
Smoothing
 Modifies exponential smoothing for following
a linear trend
Let St  (1 )St1X t

Let St[2]  (1 )St[2]1St


 i.e. Smooth the smoothed value

Let Xˆ t  2St  St[2]

38
Single and Double smoothed values

16

14

12

10
St Lags Trend Data
8 Single, a=0.5
Double, a=0.5
6

2 St[2] Lags even more


0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

39
Double Smoothing

20
18

16

14
2St -St[2] doesn’t lag
12
Trend Data
10
2(S(t)-S2(t)
8
6

2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Period

40
E St  E X t 






1 b1 



 


   
   



E St


[2] 
  E St 1


 b1 


   
   
 

b1   

E St  E St



[2] 









 




1
    
     
  

Thus estimate slope at time t as


ˆ
b1(t)   [2]
St  St







1
 
 
 

41
E X t  E St 


1


 b1 





 


   
   

 
 

1    [2]   


E X t  E St 





  E S  E S  









 1  t  t  
     
     
 
 
 

E 



X t  2E St  E St 




[2] 












  
  

Xˆ t  2 St  St[2]
 
     
     
     
     
     
 

42
Xˆ t  2St  St[2]

Xˆ t  Xˆ t bˆ1

   

[2]  
[2] 

Xˆ t  2St  St  St  St
   
   
   

1
   
   
   
   

Xˆ t  2  St  1  St[2]
   
   
   


 1 1








43
44
-1
0
1
2
3
4
5
6
1

11

16

21
Example

26

31

36

41

46
Trend
Series

51

56

61

66

71

76

81

86

91

96

101
6

3
=0.2
Trend
2 Series Data
Single Smoothing
1 Double smoothing

-1
1

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

101
45
6

4
Single Lags a trend
3

Trend
2 Series Data
Single Smoothing
1 Double smoothing

-1
1

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

101
46
6

4
Double Over-shoots a change
3 (must “re-learn” the slope)
Trend
2 Series Data
Single Smoothing
1 Double smoothing

-1

101
41
16

21

26

31

36

46

51

56

61

66

71

76

81

86

91

96
1

11

47
Holt-Winters Trend and
Seasonal Methods
 “Exponential smoothing for data with trend
and/or seasonality”
– Two models, Multiplicative and Additive
 Models contain estimates of trend and
seasonal components
 Models “smooth”, i.e. place greater weight on
more recent data

48
Winters Multiplicative Model

 Xt = (b1+b2t)ct + t
 Where ct are seasonal terms and
L
ct  L where L is the season length
t 1
 Note that the amplitude depends on the level
of the series
 Once we start smoothing, the seasonal
components may not add to L

49
Holt-Winters Trend Model

 Xt = (b1+b2t) + t
 Same except no seasonal effect
 Works the same as the trend + season model
except simpler

50
 Example:
 



1.5


 

X t  1 0.04t 0.5












 
 
 


1 

 

51
Xt =(1 + 0.04t)(1.5,0.5,1)

4.5

3.5
(1+0.04t)
3

2.5

1.5

0.5

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
52
Xt =(1 + 0.04t)(1.5,0.5,1)

4.5

4
*150%
3.5

2.5

1.5

0.5

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
53
Xt =(1 + 0.04t)(1.5,0.5,1)

4.5

3.5

2.5
*50%
2

1.5

0.5

0
1

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58
54
 The seasonal terms average 100% (i.e. 1)
 Thus summed over a season, the ct must add
to L
 Each period we go up or down some
percentage of the current level value
 The amplitude increasing with level seems to
occur frequently in practice

55
Recall Australian Red Wine Sales

Series

3000.

2500.

2000.

1500.

1000.

500.

0 20 40 60 80 100 120 140

56
Smoothing

 In Winters model, we smooth the “permanent


component”, the “trend component” and the
“seasonal component”
 We may have a different smoothing
parameter for each (, , )
 Think of the permanent component as the
current level of the series (without trend)

57
Step 1. Update the Permanent Component

Let a1(T ) b1b2T be the permanent Component

The update step is:


V
 (1 ) aˆ (T 1) bˆ (T 1)
 

aˆ1(T )  
 
T 


cˆ (T  L)
 


1 2 

T

58
Step 1. Update the Permanent Component

Let a1(T ) b1b2T Current Observation


V
 (1 ) aˆ (T 1) bˆ (T 1)
 

aˆ1(T )  
 
T 


cˆ (T  L)
 


1 2 

T

59
Step 1. Update the Permanent Component

Let a1(T ) b1b2T Current Observation


“deseasonalized”
V
 (1 ) aˆ (T 1) bˆ (T 1)
 

aˆ1(T )  
 
T 


cˆ (T  L)
 


1 2 

T

60
Step 1. Update the Permanent Component

Let a1(T ) b1b2T


V
 (1 ) aˆ (T 1) bˆ (T 1)
 

aˆ1(T )  
 
T 


cˆ (T  L)
 


1 2 

T

Estimate of permanent component from


last time = last level + slope*1
61
Step 1. Update the Permanent Component

Let a1(T ) b1b2T


V
 (1 ) aˆ (T 1) bˆ (T 1)
 

aˆ1(T )  
 
T 


cˆ (T  L)
 


1 2 

T

aˆ1(T )   Current observed level










 

 (1 ) Forecast of current level










 

62
Step 2. Update the Trend Component

bˆ2(T )   aˆ (T )  aˆ (T 1)  (1  )bˆ2(T 1)


 
 
 
 
 1 1 

63
Step 2. Update the Trend Component

bˆ2(T )   aˆ (T )  aˆ (T 1)  (1  )bˆ2(T 1)


 
 
 
 
 1 1 

“observed” slope

64
Step 2. Update the Trend Component

bˆ2(T )   aˆ (T )  aˆ (T 1)  (1  )bˆ2(T 1)


 
 
 
 
 1 1 

“observed” slope “previous” slope

65
Step 3. Update the Seasonal Component
for this period

V
cˆT (T )   T  (1 )cˆT (T  L)
aˆ (T )
1

Since VT  a1(T )c1(T )

66
To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)
 
 
 
 


1 2 


67
To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)
 
 
 
 


1 2 


Extend the trend out  periods ahead

68
To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)
 
 
 
 


1 2 


Use the proper seasonal adjustment

69
Winters Additive Method

 Xt = b1+ b2t + ct + t
 Where ct are seasonal terms and

L
ct  0 where L is the season length
t 1
 Similar to previous model except we
“smooth” estimates of b1, b2, and the ct

70
Croston’s Method

 Can be useful for intermittent, erratic, or


slow-moving demand
– e.g. when demand is zero most of the
time (say 2/3 of the time)
 Might be caused by
– Short forecasting intervals (e.g. daily)
– A handful of customers that order
periodically
– Aggregation of demand elsewhere (e.g.
reorder points)

71
Demand Distribution

0.8

0.7

0.6

0.5
Probability

0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9
Demand

72
Typical situation

 Central spare parts inventory (e.g. military)


 Orders from manufacturer
– in batches (e.g. EOQ)
– periodically when inventory nearly
depleted
– long lead times may also effect batch size

73
Example
Demand Prob Demand each
0 0.85 period follows a
1 0.1275 distribution that
2 0.0191 is usually zero
3 0.0029
4 0.0004
5 0.00006
6 0.00001
7 0.000002
74
75
Demand

0.5
1.5
2.5
3.5

0
1
2
3
1
14
27
40
53
66
79
92
Example

105
118
131
144
157
170
183
196

Period
209
222
235
An intermittent Demand Series

248
261
274
287
300
313
326
339
352
365
378
391
Example

 Exponential smoothing applied (=0.2)


Exponential Smoothing Applied

0.9

0.8

0.7

0.6
Demand

0.5

0.4

0.3

0.2

0.1

0
1
19
37
55
73
91
109
127
145
163
181
199
217
235
253
271
289
307
325
343
361
379
397
Period

76
Using Exponential Smoothing:

 Forecast is highest right after a non-zero


demand occurs
 Forecast is lowest right before a non-zero
demand occurs

77
Croston’s Method

 Separately Tracks
– Time between (non-zero) demands
– Demand size when not zero
 Smoothes both time between and demand
size
 Combines both for forecasting

Demand Size
Forecast =
Time between demands

78
Define terms

 V(t) = actual demand outcome at time t


 P(t) = Predicted demand at time t
 Z(t) = Estimate of demand size (when it is not
zero)
 X(t) = Estimate of time between (non-zero)
demands
 q = a variable used to count number of
periods between non-zero demand

79
Forecast Update

 For a period with zero demand


– Z(t)=Z(t-1)
– X(t)=X(t-1)

 No new information about


– order size Z(t)
– time between orders X(t)

 q=q+1
– Keep counting time since last order

80
Forecast Update

 For a period with non-zero demand

– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1

81
Forecast Update

 For a period with non-zero demand

– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1 Latest
order size
 Update Size of order via smoothing

82
Forecast Update

 For a period with non-zero demand

– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1 Latest time
between orders
 Update size of order via smoothing
 Update time between orders via smoothing

83
Forecast Update

 For a period with non-zero demand

– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1 Reset
counter
 Update size of order via smoothing
 Update time between orders via smoothing
 Reset counter of time between orders

84
Forecast

 Finally, our forecast is:

Z(t) Non-zero Demand Size


P(t) = =
X(t) Time Between Demands

85
Recall example

 Exponential smoothing applied (=0.2)

Exponential Smoothing Applied to Example Data


0.9
0.8
0.7
0.6
Demand

0.5
0.4
0.3
0.2
0.1
0
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
386
397
Period

86
Recall example

 Croston’s method applied (=0.2)

Croston's Method Applied to Example Data


0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
386
397
87
What is it forecasting?

 Average demand per period

Croston's Method Applied to Example Data


0.9

0.8

0.7

0.6
True average demand per period=0.176
0.5

0.4

0.3

0.2

0.1

0
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
386
397
88
Behavior

 Forecast only changes after a demand


 Forecast constant between demands
 Forecast increases when we observe
– A large demand
– A short time between demands
 Forecast decreases when we observe
– A small demand
– A long time between demands

89
Croston’s Method

 Croston’s method assumes demand is


independent between periods
– That is one period looks like the rest
(or changes slowly)

90
Counter Example

 One large customer


 Orders using a reorder point
– The longer we go without an order
– The greater the chances of receiving an
order

 In this case we would want the forecast to


increase between orders
 Croston’s method may not work too well

91
Better Examples

 Demand is a function of intermittent random


events
– Military spare parts depleted as a result of
military actions
– Umbrella stocks depleted as a function of
rain
– Demand depending on start of
construction of large structure

92
Is demand Independent?

 If enough data exists we can check the


distribution of time between demand
 Should “tail off” geometrically

93
Theoretical behavior

Theoretical Time Between Demands Distribution

12

10

8
Fequency

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time Between

94
In our example:

Time Between Demands in Example

14

12

10
Frequency

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time Between

95
Comparison

Time Between Demands

14

12

10
Frequency

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time Between

96
Counterexample

 Croston’s method might not be appropriate if


the time between demands distribution looks
like this:
Distribution of Time Between Demand

12

10

8
Frequency

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time Between

97
Counterexample

 In this case, as time approaches 20 periods


without demand, we know demand is coming
soon.
 Our forecast should increase in this case

98
Error Measures
 Errors: The difference between actual and
predicted (one period earlier)
 et = Vt – Pt(t-1)
– et =can be positive or negative
 Absolute error |et|
– Always positive
 Squared Error et2
– Always positive
 The percentage error PEt = 100et / Vt
– Can be positive or negative

99
Bias and error magnitude

 Forecasts can be:


– Consistently too high or too low (bias)
– Right on average, but with large
deviations both positive and negative
(error magnitude)
 Should monitor both for changes

100
Error Measures
 Look at errors over time
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)

101
Error Measures Measure Bias
 Look at errors over time
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)

102
Error Measures Measure error
 Look at errors over time magnitude
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)

103
Error Total
 Sum of all errors

n
ET  e
t
t 1
 Uses raw (positive or negative) errors
 ET can be positive or negative
 Measures bias in the forecast
 Should stay close to zero as we saw in last
presentation

104
MPE

 Average of percent errors

1 n
MPE   PE
nt 1 t

 Can be positive or negative


 Measures bias, should stay close to zero

105
MSE

 Average of squared errors

1 n
MSE  e 2
nt 1 t

 Always positive
 Measures “magnitude” of errors
 Units are “demand units squared”

106
RMSE
 Square root of MSE

1 n 2
RMSE   et
n t 1
 Always positive
 Measures “magnitude” of errors
 Units are “demand units”
 Standard deviation of forecast errors

107
MAPE

 Average of absolute percentage errors

1 n
MAPE   PE
nt 1 t

 Always positive
 Measures magnitude of errors
 Units are “percentage”

108
Mean Absolute Deviation

 Smoothed absolute errors

MADt  (10.3)MAD 0.3e


t 1 t
 Always positive
 Measures magnitude of errors
 Looks at the recent past

109
Percentage or Actual units

 Often errors naturally increase as the level of


the series increases
 Natural, thus no reason for alarm
 If true, percentage based measured preferred
 Actual units are more intuitive

110
Squared or Absolute Errors

 Absolute errors are more intuitive


 Standard deviation units less so
– 66% within  1 S.D.
– 95% within  2 S.D.
 When using measures for automatic model
selection, there are statistical reasons for
preferring measures based on squared errors

111
Ex-Post Forecast Errors

 Given
– A forecasting method
– Historical data
 Calculate (some) error measure using the
historical data
 Some data required to initialize forecasting
method.
 Rest of data (if enough) used to calculate ex-
post forecast errors and measure

112
Automatic Model Selection

 For all possible forecasting methods


– (and possibly for all parameter values e.g.
smoothing constants – but not in SAP?)
 Compute ex-post forecast error measure
 Select method with smallest error

113
Automatic  Adaptation

 Suppose an error measure indicates behavior


has changed
– e.g. level has jumped up
– Slope of trend has changed
 We would want to base forecasts on more
recent data
 Thus we would want a larger 

114
Tracking Signal (TS)

ETt
TSt 
MADt

TSt  0 if MADt is zero

 Bias/Magnitude = “Standardized bias”

115
 Adaptation
 

t    0.2 TS 
 
 

t 1
 
t t 1





t  0.8  0.2TSt
t 1
subject to 0.05  0.9
 If TS increases, bias is increasing, thus
increase 
 I don’t like these methods due to instability

116
Model Based Methods

 Find and exploit “patterns” in the data


 Trend and Seasonal Decomposition
– Time based regression
 Time Series Methods (e.g. ARIMA Models)
 Multiple Regression using leading indicators
 Assumes series behavior stays the same
 Requires analysis (no “automatic model
generation”)

117
Univariate Time Series
Models Based on
Decomposition
 Vt = the time series to forecast
 Vt = T t + St + N t
 Where
– Tt is a deterministic trend component
– St is a deterministic seasonal/periodic
component
– Nt is a random noise component

118
Raw Material Price

3.8

3.6

3.4

3.2
Price ($/Unit)

3
Price
2.8

(Vt)=0.257
2.6

2.4

2.2

2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Period

119
Raw Material Price

3.8

3.6

3.4

3.2
Price ($/Unit)

3
Price
2.8

2.6

2.4

2.2

2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Period

120
Simple Linear Regression
Model:
Vt=2.877174+0.020726t
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.569724
R Square 0.324585
Adjusted R Square
0.293884
Standard Error0.21616
Observations 24

ANOVA
df SS MS F Significance F
Regression 1 0.494006 0.494006 10.57257 0.003659
Residual 22 1.027956 0.046725
Total 23 1.521963

Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept 2.877174 0.091079 31.58978 7.99E-20 2.688287 3.066061 2.688287 3.066061
X Variable 1 0.020726 0.006374 3.251549 0.003659 0.007507 0.033945 0.007507 0.033945

121
Use Model to Forecast into the Future

Actuals and Forecasts

3.8

3.6

3.4

3.2

3
Price

Price
2.8 Forecast

2.6

2.4

2.2

2
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31

33
35
Period

122
Residuals = Actual-Predicted
et = Vt-(2.877174+0.020726t)
(et)=0.211
Residuals After Regression

0.4

0.3

0.2

0.1
Residuals

0 Residuals

-0.1

-0.2

-0.3

-0.4
1

11

13

15

17

19

21

23
Period

123
Simple Seasonal Model

 Estimate a seasonal adjustment factor for


each period within the season
 e.g. SSeptember

124
Residuals Season Residuals Season Season Averages
0.1521 1 0.1521 1
-0.24862609 2 0.27992173 1 Sorted
0.03064782 3 0.22774346 1 by season
0.27992173 1 0.19556519 1
-0.21080436 2 0.18338692 1
-0.07153045 3 0.28120865 1
0.22774346 1 0.33903038 1
-0.28298263 2 0.34685211 1 0.250726055
0.03629128 3 -0.24862609 2
0.19556519 1 -0.21080436 2
-0.1951609 2 -0.28298263 2
0.00411301 3 -0.1951609 2 Season
0.18338692 1 -0.28733917 2
-0.28733917 2 -0.20951744 2
averages
-0.00806526 3 -0.24169571 2
0.28120865 1 -0.26387398 2 -0.242500035
-0.20951744 2 0.03064782 3
-0.00024353 3 -0.07153045 3
0.33903038 1 0.03629128 3
-0.24169571 2 0.00411301 3
-0.0424218 3 -0.00806526 3
0.34685211 1 -0.00024353 3
-0.26387398 2 -0.0424218 3
-0.01460007 3 -0.01460007 3 -0.008226125
125
Trend + Seasonal Model

 Vt=2.877174+0.020726t + Smod(t,3)

 Where
– S1 = 0.250726055
– S2 = -0.242500035
– S3 = -0.008226125

126
Actual vs Forecast (Trend + Seasonal Model)

3.8

3.6

3.4

3.2
Price

Price
3
Forecast
2.8

2.6

2.4

2.2

2
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31

33
35
Period

127
et = Vt - (2.877174 + 0.020726t + Smod(t,3))

(et)=0.145
Residuals from Trend+Season

0.15

0.1

0.05
Residuals

0 Residuals2

-0.05

-0.1

-0.15
1

11

13

15

17

19

21

23
Period

128
Can use other trend models

 Vt= 0+ 1Sin(2t/k) (where k is period)


 Vt= 0+ 1t + 2t2 (multiple regression)
 Vt= 0+ 1ekt
 etc.
 Examine the plot, pick a reasonable model
 Test model fit, revise if necessary

129
130
-3
-2
-1
0
1
2
3
1

5
9

13

17

21
25

29

33

37
41

45

49

53
57

61

65

69

73
77

81

85

89
93

97
Signal=COS(2*PI*t/12)

1
Tim e t

Series1
0
Signal

-1

-2

-3
1

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96
S(t)

131
Model: Vt = Tt + St + Nt

 After extracting trend and seasonal


components we are left with “the Noise”
Nt = Vt – (Tt + St)
 Can we extract any more predictable
behavior from the “noise”?
 Use Time Series analysis
– Akin to signal processing in EE

132
Zero Mean, and Aperiodic:
ˆ
Is our best forecast t 1  0 ?
N

Demand

2
Demand

-2

-4

-6
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
Period
133
AR(1) Model

 This data was generated using the model


 Nt = 0.9Nt-1 + Zt
 Where Zt ~N(0,2)
 Thus to forecast Nt+1,we could use:

Nˆ  0.9Nt
t 1
2
Nˆ  0.9Nˆ  0.9 Nt
t 2 t 1
134
AR(1): Actual vs 1-Step Ahead Forecast

2
Demand

Actual
0
Forecast

-2

-4

-6
17
33
49
65
81
97
1

113
129
145
161
177
193
209
225
241
257
273
289
Period

135
Forecasting N Steps Ahead

2
Demand

Actual
0
Forecast

-2

-4

-6
1

115
134
153
172
191
210
229
248
267
286
305
324
343
20
39
58
77
96

Period

136
Time Series Models

 Examine the correlation of the time series to


past values.
 This is called “autocorrelation”
 If Nt is correlated to Nt-1, Nt-2,…..
 Then we can forecast better than

Nˆ  0
t 1

137
Sample Autocorrelation Function

Sample ACF Sample PACF


1.00 1.00

.80 .80

.60 .60

.40 .40

.20 .20

.00 .00

-.20 -.20

-.40 -.40

-.60 -.60

-.80 -.80

-1.00 -1.00
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40

138
Back to our Demand Data

Residuals from Trend+Season

0.15

0.1

0.05
Residuals

0 Residuals2

-0.05

-0.1

-0.15
1

11

13

15

17

19

21

23
Period

139
No Apparent Significant
Autocorrelation
Sample ACF Sample PACF
1.00 1.00

.80 .80

.60 .60

.40 .40

.20 .20

.00 .00

-.20 -.20

-.40 -.40

-.60 -.60

-.80 -.80

-1.00 -1.00
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40

140
Multiple Linear Regression

 V= 0+ 1 X1 + 2 X2 +….+ p Xp + 
 Where
– V is the “independent variable” you want
to predict
– The Xi‘s are the dependent variables you
want to use for prediction (known)
 Model is linear in the i‘s

141
Examples of MLR in
Forecasting
 Vt= 0+ 1t + 2t2 + 3Sin(2t/k) + 4ekt
– i.e a trend model, a function of t
 Vt= 0+ 1X1t + 2X2t
– Where X1t and X2t are leading indicators
 Vt= 0+ 1Vt-1+ 2Vt-2 + 12Vt-12 +13Vt-13
– An Autoregressive model

142
Example: Sales and Leading Indicator
Series 1 Series 2

14.00

260.
13.50

13.00 250.

12.50
240.

12.00

230.

11.50

220.
11.00

10.50 210.

10.00
200.

9.50

0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140

143
Example: Sales and Leading Indicator
Series 1 Series 2

14.00

260.
13.50

13.00 250.

12.50
240.

12.00
Sales(t) = -3.93+0.83Sales(t-3)
230.

11.50 -0.78Sales(t-2)+1.22Sales(t-1) -5.0Lead(t)


220.
11.00

10.50 210.

10.00
200.

9.50

0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140

144

You might also like