Professional Documents
Culture Documents
Inventory Management Lecture-04
Inventory Management Lecture-04
A forecast of 3.5 units is the best forecast. In the long run, no other forecast
would be better. Unfortunately, the replenishment manager doesn’t know
how demand is being generated.
Figure 4-1 shows the purchases over the past month.3
Figure 4-1 Average POS
The solid lines in Figure 4-1 are the actual purchases (POS), and the dashed
lines represent the average. In this case, a simple average is the best
forecast. A simple average is just the average of all POS, starting from the
beginning, and ending with the most recent observation. Let xi be the ith
observation of a total of n observations; then the simple average is
By about ten days, in this case, the simple average is close to the mean of
the probability distribution. However, if you look at the first nine days, you
might think there is a positive trend even though there is not. In fact, a
number of forecasting methods would actually suggest a trend using the
first nine days of this data.
Figure 4-2 also comes from the same distribution. But in this case, the first
few days look like there is a negative trend. However, it doesn’t take long
for the simple average to get close to the mean of the distribution.
Figure 4-2 Appearance of a negative trend at first
Now let’s assume the replenishment manager does know how the demand
is being generated. For simplicity, let’s also assume that the lead time is
overnight and certain. That is, at the end of each day, any pieces of candy
sold during the day are replaced the next morning. 4 Table 4-1 shows the
cumulative probability distribution of demand. The cumulative probability
up to a demand during lead time of three units is 1/6 + 1/6 + 1/6 = 0.5. So,
if the manager stocked up to three pieces of candy each day, the manager
would be in stock half the time; that is, the PPIS = 0.5.
Moving Average
The solid line is POS, and the dotted line is the simple average. Notice that
by day 60, the simple average is at about 5 units, well below the expected
value of 7 units. The dashed line is a ten-day moving average.7 The ten-week
moving average is given by the following:
Figure 4-4 is day 11 to day 30. The solid line is POS, the dotted line is the
simple average, and the dashed line is the ten-day moving average.
Comparing Figures 4-2 and 4-3 reveals the trade-off between the two
forecasting methods—namely, although a ten-day moving average adapts
more quickly to the change in the underlying demand, it overreacts to
random changes, compared to the simple average. In Figure 4-4 we see the
ten-day moving average is more erratic than the simple average. If the
underlying demand process doesn’t change, the simple moving average
does very well, but if there is a change in the underlying process, the simple
moving average adapts more slowly than the moving average.
Figure 4-5 illustrates the point even further. Again, this is a discrete event
simulation of days 11 through 30. The solid line is POS, the dotted line is
the simple average, the dash-dotted line is the five-week moving average,
and the dashed line the ten-day moving average. In Figure 4-4 we saw that
the ten-day moving average was more erratic than the simple average;
in Figure 4-5 we see that the five-day moving average is more erratic than
the ten-day moving average.
Figure 4-5 Comparison with five-week moving average
Naïve Forecast
If we go to a one-day moving average, it really isn’t a moving average, it is
just taking what happened the previous day as the forecast. This is referred
to as a naïve forecast (see Figure 4-6).
Simple Average
One problem with a simple average is that a serious outlier will have an
effect on the forecast for a long time. In Figure 4-6, there are no outliers.
Figure 4-7 is an example of where there is a spike in demand on the first
day. This could simply be a data error. This is also included in the five-day
moving average but is out of the calculation for the five-day moving average
by day 7. However, it remains in the calculation of the simple average
forever. Eventually its effect is negligible, but throughout the range
on Figure 4-7, it has an impact. You can see that the simple average is above
the five-day moving average and the sales from day 7 through day 16. This
illustrates a problem with the simple average—outliers have a lasting
impact that can bias the forecast for many periods into the future.
The x’s in the sixth degree polynomial are for the day. So, for example, if
you were to put a 1 into all of the x’s and evaluate the equation, you would
get a forecast of two pieces of candy. The sixth degree polynomial is much
more impressive than the simple average, and it is a better fit of the sales
data; however, we already know that the best forecast is 3.5 pieces of candy.
The real question is: How will this forecast into the future? With the simple
average, the most recent forecast is the forecast of each period into the
future. In this example, the most recent forecast of the simple average
model is 3.2 pieces of candy; therefore, the forecast of the 30th day out is
3.2 pieces as is the 60th day out.10 Now, if you want to forecast day 30 with
the sixth degree polynomial, you put a 60 in the x variable, and you get a
number that is around negative six million. In fact, this sixth degree
polynomial quickly goes negative when it is outside the data that it was fit
to. So, this seems like a paradox: The sixth degree polynomial fits the data
much better than the simple average, but when the sixth degree polynomial
is projected into the future, it quickly performs horribly. It performs
horribly because it was fitting randomness. You can’t forecast randomness.
Measuring Uncertainty
Later in this chapter, we discuss other methods of forecasting and how they
relate to inventory management, but first we need to talk about assessing
the level of randomness, which is a function of how well you can forecast
demand from the historical sales data. We need to do this because there is a
relationship between how much safety stock we need and how much
randomness there is.
Before we do that, it is important to distinguish between variability and
randomness. For example, suppose a certain type of cinnamon roll in a
grocery store sells 90 percent of its volume on Saturday. So, suppose you
sell 0 units on Sunday, 2 units per day from Monday through Friday, and
90 units on Saturday. Also, suppose this holds day in and day out
throughout the entire year. Well, there is a lot of variability throughout the
week, but there is perfect predictability, so there is no need for safety
stock.11So, we need to know how much uncertainty there is in sales data. In
other words, we need to know how predictable the sales are based on the
sales data. To that end, we now discuss measures of uncertainty based on
forecasts of sales.
There are many measures of forecast error, but we are going to look at bias,
mean absolute deviation (MAD), mean absolute percent error (MAPE), and
standard deviation of the forecast error (σFE). Before talking about
measuring forecast error (FE), we need to define forecast error for one
forecast. The forecast error for period i is defined as FEi = ai – fi where ai is
the actual realized sales for period i andfi is the forecast for period i.
So, suppose bias = 2 and MAD = 4. That means that, on average, the
forecasting technique is under forecasting by two units, but overall the
forecast is off by four units. The problem with this is that we cannot
compare different SKUs. For example, suppose one SKU has a MAD of 10
but sells on average 100 units per day and another SKU has a MAD of 10
but sells a 1,000,000 units per day. Clearly the latter forecast is more
accurate than the former. So MAD is not good for comparing different
SKUs. MAPE overcomes this problem.
The point of this is that there can be trade-offs among these measures of
forecast error. Since they all contain unique information, or
characterizations of the information, it is important to understand each of
them and to avoid using them in ways that might be misleading, such as
comparing two SKUs with MAD when the SKUs have different levels of
demand.
Exponential Smoothing
We now explore another class of forecasting models that are integral with
inventory management, namely, exponential smoothing 12 models. We begin
with the simplest in this genre, first order exponential smoothing.
First order exponential smoothing weights the last forecast against the
forecast error of the last period. So, for example, if the last forecast was 15
and the error that period was 5, it would adjust 15 up by some fraction of
the error 5. That is, first order exponential smoothing takes the forecast and
adjusts it by a fraction of how much it was off from actual sales. This
fraction is referred to as that smoothing constant, usually designated by the
Greek lowercase letter alpha, α, which varies between zero and one, α (0,
1). So, if α is small, the adjustment is small, but if α is large, the adjustment
is large. If ft is the forecast for period t, and at is the actual sales for period t,
then the exponentially smoothed forecast for period t+1 is
ft + 1 = ft + α(at – ft) α (0,1)
If there is a lot of randomness in the data, alpha should be low. If the level
of demand is changing, alpha should be higher, at least for a time. For
example, suppose you have a product with fairly level demand but a
competitive new item is being introduced. Then alpha should be higher for
a time, until the competitive effects cause the demand for the item to be at a
new level. Alpha should usually be between 0.1 and 0.3, but it actually
depends on the data and situation. Let’s consider the extremes. Suppose
alpha is zero; then you always use the first forecast.
ft + 1 = ft + 0(at − ft)
ft + 1 = ft
ft + n = ft
On the other hand, suppose alpha is equal to one; then you have the naïve
forecast.
ft + 1 = ft + 1(at − ft)
ft + 1 = ft + at − ft
ft + 1 = at
One challenge associated with using first order exponential smoothing is
that you have to start with a previous forecast. So, for the first forecast you
need a previous forecast. If you have data, you could use an average. If you
don’t have data, you could make an estimate with your judgment or you
could get a panel of experts and use an average of their estimates. Whatever
you start with will have a significant impact for quite a while, especially if
you are using a low alpha. For example, suppose you are forecasting POS
generated from the roll of a die, but as before, you do not know how the
demand is being generated. Suppose you start with an estimate of one piece
and are using an alpha of 0.1.
Figure 4-11 is a graph of the forecast error over 60 days. You can see that
for the first third, the bias is positive; that is, we are over forecasting. That
is due to the fact that we started with an estimate that was too low to begin
with and since alpha is low, it took about 20 days, in this example, to
overcome this low initial estimate. Now, let’s change the alpha to 0.5.
Figure 4-11 Forecast error with exponential smoothing
In Figure 4-12, there is no clear bias in the first 20 days. This is a result of
having a higher alpha; it allows the forecast to adjust more quickly.
So, back to the problem of not having any data for the first forecast. One
solution is to start with an estimate but to keep alpha on the higher end of
the range and slowly adjust it back to a lower level.
So far we have only looked at a one period ahead forecast. What is the
forecast if you are using first order exponential smoothing and you need a
forecast ten periods into the future? It turns out that the most recent
forecast is the forecast for all future periods. So, if the first order
exponentially smoothed forecast for the next period is 34, then at this point
in time, the first order exponentially smoothed forecast for period 20 is 34
units. Of course, as we move forward in time, this will change. By the time
we get to period 19, the forecast for period 20 might be different from 34
units, but that is what it would be in the current period given that the
forecast for the next period is 34 units.
Let’s look at an example. Suppose the forecast for the current period is 20
but actual sales was 30 and that alpha is 0.1. Then the forecast for the next
period is
fperiod2 =fperiod1 + 0.1 * (aperiod1 − fperiod1)
fperiod2 = 20 + 0.1 * (30 − 20)
fperiod2 = 20 + 1 = 21
That is also the forecast for the period after that:
fperiod3 = 21
However, suppose now that we come to the end of the next period and
actual sales turned out to be 11.
fperiod3 = fperiod2 + 0.1 * (aperiod2 – fperiod2 )
fperiod3 = 21 + 0.1 * (11 − 21) = 21 − 1 = 20
So, the Period 3 forecast at the end of Period 1 is 21, but the Period 3
forecast at the end of Period 2 is 20. In addition to making a decision about
the initial forecast and the level of alpha, another question that has to be
addressed in first order exponential smoothing is the following: How often
should the forecast be updated? In the example we just examined, it was
being updated every period, but that may not be practical nor optimal. It
might not be practical because you cannot change your order every period,
for example. It might not be optimal because it creates too much erratic
behavior in the system. Again, this is an issue that can be analyzed with
discrete event simulation.
The alpha that yields a bias closest to zero may not be the same as the one
that minimizes the standard deviation of forecast error over a given period
of time. A high bias, meaning you are under forecasting, may result in an
expected demand during the protection period being too low, resulting in
more stockouts. An alpha that is too high will cause the standard deviation
of forecast error to be high, resulting in more safety stock. The point here is
that your initial estimate and your selection of alpha both have a lasting
impact on the performance of your inventory management. Discrete event
simulation is a great tool for assessing these decisions, their impact on
forecasting performance, and the resulting impact on inventory
management performance.
First order exponential smoothing assumes there is no trend or seasonality
in the demand. If there is upward trend and first order exponential
smoothing is used, there will be a negative bias. Increasing alpha will
reduce the bias because the forecast will adjust up more quickly, but there
will still be a bias. Similarly, if there is a downward trend, there will always
be a positive bias, since you will on average be forecasting too high.
we are taking the previous forecast of the level and adjusting by a fraction
of the error, similar to what we did with the first order exponential
smoothing formula.
The trend component can be thought of as a forecast in the change of the
level of demand. So, the updating of the forecast in the change in the level
in demand is an adjustment to the previous trend estimate based on a
fraction of the error.
Tt = Tt − 1 + β([Lt − Lt−1] − Tt−1)
Again, it is instructive to look back at the formula for first order exponential
smoothing as well as the formula for the estimate of the level in the second
order exponential smoothing. You see the pattern the previous estimate
plus a fraction of the error. We can think of [Lt – Lt – 1] as the actual change
in the level and Tt – 1 as the previous forecast of the change in the level. So,
the difference is the error in the previous estimate.
Let’s look at an example. Suppose the previous level was 100 and the
previous trend estimate was 1. Assume that the actual sales were 110 and
this period was 120 and that the previous level was 111; and that alpha and
beta are both 0.1.
Lt = (at−1 + Tt−1) + α[at − (at−1 + Tt−1)]
Lt = (110 + 1) + 0.1[120 − (110 + 1)]
= (111) + 0.1[9] ≈ 112
Tt = Tt – 1 + β([Lt − Lt−1] − Tt−1)
Tt = 1+0.1([112 − 111] − 1) = 1
Now, let’s forecast 10 periods into the future.
ft + n = Lt + nTt
ft + 10 = 112 + 10 * 1 = 122
Suppose we wanted to forecast 365 periods out, then
ft + 365 = 112 + 365 * 1 = 477
This doesn’t seem reasonable. It seems that there is a point where the trend
amount increases at a decreasing rate.
Damped Trend
The way of dealing with this is through damped trend14 adjusted
exponential smoothing. This uses a damping factor, ϕ (0,1).
Let’s use the numbers from the previous example but apply damped trend
adjusted exponential smoothing forecasting 10 periods out and let ϕ = 0.9.
Recall before we were multiplying by 365. Now, the damping factor selected
has a large impact on how quickly this sum converges. If we would have set
ϕ= 0.98, then we would have multiplied by 49 instead of 9, but that is still
significantly less than 365.
The vertical axis of Figure 4-14 is , the horizontal axis is n, and the
lines, starting with the lowest line represents ϕ = 0.9 .0.91, 0.92, 0.93,
0.94, and 0.98, respectively. You can see that for ϕ = 0.9 .0.91, 0.92, 0.93,
and 0.94 they begin to converge fairly quickly, around 30 to 45 periods out.
Although for ϕ = 0.9 it is at about 37, at n=365 it is at 49. This method is
important for the prevention of ridiculously high forecasts way out into the
future.
Figure 4-14 Damping factor
In Figure 4-16 we have a downward trend, and we see again that the
downward trend of the multiplicative seasonal trend model (solid line) is
more dramatic than the multiplicative seasonal damped trend model
(dotted line). In fact, as we forecast out further into the future we see less
dramatic swings in the nondamped model, the opposite of what we saw
with the upward trend.
Here again, we are taking the previous estimate of the seasonal factor St –
Notice that for the first order exponential smoothing and the additive
seasonality with no trend that the last estimate is only the level, but with
the trend adjusted exponential smoothing and the multiplicative seasonal
with trend there is the last level plus the trend. The reason for that is that
the previous level plus the previous trend is the estimate of what the level
would be the next period, and that is precisely what must be adjusted based
on the error. Now, the additive seasonality model and the multiplicative
seasonal model can include trend or not. In our exposition we did not
include trend with the additive seasonal model but we did with the
multiplicative model, but that was arbitrary.
Now, the trend equation for both the trend adjusted exponential smoothing
and the multiplicative seasonal model with trend are the
same: Tt = Tt−1 + β([L − Lt−1] − Tt−1). In addition, if we would have had a trend
component in the additive seasonal, it would also be this same equation.
The seasonal components are similar except in the case of additive
seasonality; the seasonal component is in terms of the units demand,
whereas in the case of multiplicative seasonality, the seasonal component is
in terms of a ratio. Other than that they are the same:
Recall that the forecast n periods ahead for the seasonal forecast was
Ft + n = Lt + St + n−p
If it would have had linear trend, then it would have been
Ft + n = Lt + nTt + St + n−p
Whereas, for the multiplicative seasonal with trend is
ft + n = (Lt + nTt) St + n−p
One thing to notice is that with the additive model with trend, regardless of
the size of the trend, the seasonal quantities stay the same. That doesn’t
seem reasonable. Imagine you are at 100 units per month and the seasonal
factor is 10. Now imagine that at 24 months out it is forecasting 1,000 units
per month. It will still have that seasonal factor of 10. On the other hand,
with the multiplicative model we find that the seasonal factors are
magnified as n grows. That might be reasonable to a point, but it might over
magnify eventually. One solution to this is to use the damped trend. We
have already shown how the damped trend is incorporated into the
multiplicative model, but here is how it is incorporated into the additive
model.
Seasonal indices are usually based on very little data. Suppose p=12 months
for St + n – p. It is unusual to have two years of representative data. There are
situations where there are many years of representative data, but that is
often the exception. If you only have two years of data, they are based on
few observations of those particular seasons. It is possible that the
uncertainty introduced by using seasonal factors that are full of stochastic
error could be worse than the benefit it brings. So you could instead use
trend adjusted forecast with a high beta so that it would adjust quickly to
the seasonality, but using a high beta will make it incorporate more
stochastic noise into the estimate of the seasonal factor. Or you could use a
level method such as first order exponential smoothing. If you have
seasonality and use a level model, you might have (1) too much16 safety
stock throughout the year, (2) too little cycle stock during the peak season,
and (3) too much cycle stock during the trough.
The solid line in Figure 4-17 is a deterministic additive seasonal model with
a level of 100. The dotted line is the same deterministic model but with a
stochastic term added to it that is simulated from a normal distribution
with a mean of zero and a standard deviation of 20. The only difference
between the graph on the top and the graph on the bottom is that they are
two different discrete event simulations from the same distribution; the
deterministic model is the same in both graphs. These graphs demonstrate
the challenge associated with finding the seasonal components from just
two years of data—that is, with even a little stochastic disturbance, the
estimates will be off.
Figure 4-17 Seasonality with standard deviation equal to 20
Figure 4-18 is identical to Figure 4-17, except that the stochastic term has a
standard deviation of 50 instead of 20.
Figure 4-18 Seasonality with standard deviation equal to 50
We can see from this that the estimates of the seasonal components would
be completely unreasonable. The key point of this is that great care must be
taken in applying seasonal models. In fact, in Figure 4-18, a first order
exponential smoothing model would outperform a seasonal model when
forecasting into the next year. The seasonal components of the seasonal
models would add more noise into the forecasting model that would detract
from the forecast accuracy. Figure 4-18 might be the level of uncertainty
you would see for an item in one store where you sell about 100 units on
average per month. However, if you were to average sales from 200 stores
with the same seasonality, you might be able to estimate the seasonal
component of the time series more accurately.
This idea is even more exaggerated when trying to estimate both trend and
seasonal components as illustrated in Figure 4-19.
In Figure 4-19, the solid line is the same deterministic model but with a
stochastic term added to it, which is simulated from a normal distribution
with a mean of zero and a standard deviation of 50. As you can see again,
the seasonality would be difficult to detect in both graphs. The dashed lines
are the regression lines for the corresponding lines. (The thin dashed line
corresponds to the solid line, and the bold dashed line corresponds to the
dotted line.) The regression model uses the month number as the
independent variable and the level of line as the dependent variable. Hence,
the thin dashed line is the actual trend component and the bold dashed line
is the estimated trend component. In general, it is easier to estimate the
trend component than seasonal components of time series. Consequently,
in these two examples, using trend adjusted exponential smoothing with a
damped trend, being off on the initial estimate of the trend would not have
as deleterious an effect as if the trend was not damped.
As mentioned earlier, Figure 4-18 is identical to Figure 4-17, except that the
stochastic term has a standard deviation of 50 instead of 20. In Figure 4-
20, it is still difficult to estimate the seasonal factors, but the trend
estimates are very close to the actual trend components.
Figure 4-20 Seasonality and trend with standard deviation equal to 20
Figure 4-21 is simulated demand from demand that only has a level
component. The level component is 10, and the stochastic term is from a
normal distribution with a mean of zero and a standard deviation of 5. In
this example, there appears to be a trend but obviously, based on the
underlying model, there is no trend.
Figure 4-22 is simulated data from the same demand distribution
from Figure 4-21 except that it is simulated for 40 periods. Of course this is
just one discrete event simulation, but it illustrates the point that even at 40
periods, you could detect a trend that does not exist.
Figure 4-22 Another illusion of trend
It sounds as if we have shown you trend and seasonal models and now we
are saying you probably shouldn’t use them, but that is not the case. We are
simply explaining an important caveat in using trend and seasonal models.
In Figure 4-22, if we were to use damped trend, the effect of the small trend
we seemed to detect (that didn’t really exist) would be minimized. However,
it is a good idea to go further than just using damped trend. In forecasting it
is good to use logic and other empirical data prior to developing forecasting
models. Is there a reason to believe that trend and seasonality exist? Why?
What is the logic? Answering these questions may require getting others
involved, possibly from sales and marketing. In addition, there may be an
upward trend if you are gaining market share. Such information can be
obtained from companies that sell syndicated sales data such as AC Nielsen.
CAUSAL MODELS
There are other types of time series forecasting models, but we now go on to
talk some about causal models, and in particular, regression models.
Building effective regression models requires more skill than building time
series smoothing models. We use regression later in the book for other
purposes, so it is worthwhile to learn it here for multiple reasons.
Regression is a complex topic, and we just skim the surface and discuss it
from an applied perspective.
Regression
To build a regression model you must define your dependent and
independent variables. Since we are forecasting sales, the dependent
variable is sales. Regression finds a line that fits the data by minimizing the
sum of the squared errors. Errors are forecast errors. As we discussed
earlier, the forecast error for period i is defined as FEi = ai – fi, where ai is
the actual realized sales for period i andfi is the forecast for period i. In
regression, they are not referred to as forecast errors, but are referred to as
residuals, because most of the time regression is not used for forecasting
but for testing hypotheses. In particular, regression minimizes the sum of
the squared residuals for n observations.
lnFt = lna + b1lnt + b2lnAdvertising + b3lnPromotion + b4lnPrice
This is similar to how we estimated the regression for the power model of
trend. Now, suppose we want to take into account the Christmas season. To
do so we can use a dummy variable:
ENDNOTES
1. One of the best repositories of information on forecasting
ishttp://www.forecastingprinciples.com/.
2. The roll of a six-sided die is a discrete uniform distribution, where each
outcome is equally likely.
3. You can simulate the roll of a die in Excel using the following function:
=RANDBETWEEN(1,6). In general, =RANDBETWEEN(a,b) generates
random numbers between a and b using a uniform distribution, meaning
that each of the outcomes is equally likely.
4. This could happen if they were coming out of the backroom and put on
the shelf in the morning.
5. Which has plenty of candy, in our example.
6. Recall, ILFR is item-level fill rate.
7. The ten-week moving average doesn’t start until the 11th week because it
takes ten days of demand to get started.
8. Armstrong, Jon Scott, ed. Principles of Forecasting: A Handbook for
Researchers and Practitioners. Vol. 30. New York: Springer, 2001.
9. You will notice R square below the equation. We discuss that later in the
chapter.
10. We are again assuming that the replenishment manager does not know
how the demand is being generated.
11. Assuming there is no uncertainty in the lead time.
12. Brown, Robert G. Exponential Smoothing for Predicting Demand.
Cambridge, MA: Arthur D. Little,1956.
13. Holt, Charles C. “Forecasting Seasonals and Trends by Exponentially
Weighted Moving Averages.”International Journal of Forecasting 20.1
(2004): 5-10.
14. Taylor, James W. “Exponential Smoothing with a Damped
Multiplicative Trend.” International Journal of Forecasting 19.4 (2003):
715-725.
15. Winters, Peter R. “Forecasting Sales by Exponentially Weighted Moving
Averages.” Management Science 6.3 (1960): 324-342.
16. Too much in the sense that if you took seasonality into account in an
effective way, there would be less forecast error. The challenge is that
having the seasonal component per se might introduce more forecast error.
17. Eventually there should be diminishing returns to advertising.
18. Alternatively, we could say that for a 10 percent increase in spending on
advertising, there will be a 3 percent increase in sales.